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HIGH THROUGHPUT OR CAPILLARY-BASED 
SCREENING FOR A BIO ACTIVITY OR BIOMOLECULE 

RELATED APPLICATIONS 
This application claims the benefit of priority under 35 U.S.C. §1 19(e) 
of U.S. Provisional Application Serial No. 60/399,272, filed July 26, 2002. This 
application is also a continuation-in-part application ("CIP") of U.S. Patent 
Applications Serial No. ("USSN") 09/975,036, filed October 10, 2001, now pending, 
and this application is also a CIP of USSN 10/145,281, filed May 13, 2002, now 
pending, which is a divisional (DIV) of USSN 09/985,432, filed October 10, 2000, 
now pending, which is a CIP of USSN 09/444,1 12, filed November 22, 1999, now 
pending, which is a CIP of USSN 09/098,206, issued as U.S. Patent No. 6,174,673, 
filed 6/16/98, which is a CIP of USSN 08/876,276, filed June 16, 1997, now pending. 
Each of the aforementioned applications are explicitly incorporated herein by 
reference in their entirety and for all purposes. 

FIELD OF THE INVENTION 
The present invention relates generally to screening of mixed 
populations of organisms or nucleic acids and more specifically to the identification 
of bioactive molecules and bioactivities using screening techniques, including high 
throughput screening and capillary array platform for screening samples. The 
invention provides a culture-independent approach to directly clone genes encoding 
novel enzymes from environmental samples containing a mixed population of 
organisms. The invention provides a novel high throughput cultivation method based 
on the combination of a single cell encapsulation procedure with flow cytometry that 
enables cells to grow with nutrients that are present at environmental concentrations. 

BACKGROUND 
There is a critical need in the chemical industry for efficient catalysts 
for the practical synthesis of optically pure materials; enzymes can provide the 
optimal solution. All classes of molecules and compounds that are utilized in both 
established and emerging chemical, pharmaceutical, textile, food and feed, detergent 
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markets must meet stringent economical and environmental standards. The synthesis 
of polymers, pharmaceuticals, natural products and agrochemicals is often hampered 
by expensive processes which produce harmful byproducts and which suffer from low 
enantioselectivity (Faber, 1995; Tonkovich and Gerber, U.S. Dept of Energy study, 

5 1995). Enzymes have a number of remarkable advantages which can overcome these 
problems in catalysis: they act on single functional groups, they distinguish between 
similar functional groups on a single molecule, and they distinguish between 
enantiomers. Moreover, they are biodegradable and function at very low mole 
fractions in reaction mixtures. Because of their chemo-, regio- and stereospecificity, 

10 enzymes present a unique opportunity to optimally achieve desired selective 
transformations. These are often extremely difficult to duplicate chemically, 
especially in single-step reactions. The elimination of the need for protection groups, 
selectivity, the ability to carry out multi-step transformations in a single reaction 
vessel, along with the concomitant reduction in environmental burden, has led to the 

15 increased demand for enzymes in chemical and pharmaceutical industries (Faber, 
1995). Enzyme-based processes have been gradually replacing many conventional 
chemical-based methods (Wrotnowski, 1997). A current limitation to more 
widespread industrial use is primarily due to the relatively small number of 
commercially available enzymes. Only -300 enzymes (excluding DNA modifying 

20 enzymes) £i J at present commercially available from the > 3000 non DNA-modifying 
enzyme activities thus far described. 

The use of enzymes for technological applications also may require 
performance under demanding industrial conditions. This includes activities in 
environments or on substrates for which the currently known arsenal of enzymes was 

25 not evolutionarily selected. Enzymes have evolved by selective pressure to perform 
very specific biological functions within the milieu of a living organism, under 
conditions of mild temperature, pH and salt concentration. For the most part, the non- 
DNA modifying enzyme activities thus far described (Enzyme Nomenclature, 1992) 
have been isolated from mesophilic organisms, which represent a very small fraction 

30 of the available phylogenetic diversity (Amann et al., 1995). The dynamic field of 
biocatalysis takes on a new dimension with the help of enzymes isolated from 
microorganisms that thrive in extreme environments. Such enzymes must function at 



2 



09010-400001 (DIVER 1280-36) 

temperatures above 100 °C in terrestrial hot springs and deep sea thermal vents, at 
temperatures below 0 °C in arctic waters, in the saturated salt environment of the 
Dead Sea, at pH values around 0 in coal deposits and geothermal sulfur-rich springs, 
or at pH values greater than 11 in sewage sludge (Adams and Kelly, 1995). The 

5 enzymes may also be obtained from: geothermal and hydrothermal fields, acidic 
soils, sulfotara and boiling mud pots, pools, hot-springs and geysers where the 
enzymes are neutral to alkaline, marine actinomycetes, metazoan, endo and 
ectosymbionts, tropical soil, temperate soil, arid soil, compost piles, manure piles, 
marine sediments, freshwater sediments, water concentrates, hypersaline and super- 

10 cooled sea ice, arctic tundra, Sargosso sea, open ocean pelagic, marine snow, 
microbial mats (such as whale falls, springs and hydrothermal vents), insect and 
nematode gut microbial communities, plant endophytes, epiphytic water samples, 
industrial sites and ex situ enrichments. Additionally, the enzymes may be isolated 
from eukaryotes, prokaryotes, myxobacteria (epothilone), air, water, sediment, soil or 

1 5 rock. Enzymes obtained from these extremophilic organisms open a new field in 
biocatalysis. 

For example, several esterases and lipases cloned and expressed from 
extremophilic organisms are remarkably robust, showing high activity throughout a 
wide range of temperatures and pHs. The fingerprints of several of these esterases 

20 show a diverse substrate spectrum, in addition to differences in the optimum reaction 
temperature. Certain esterases recognize only short chain substrates while others only 
acts on long chain substrates in addition to a huge difference in the optimal reaction 
temperature. These results demonstrate that more diverse enzymes fulfilling the need 
for new biocatalysts can be found by screening biodiversity. Substrates upon which 

25 enzymes act are herein defined as bioactive substrates. 

Furthermore, virtually all of the enzymes known so far have come 
from cultured organisms, mostly bacteria and more recently archaea (Enzyme 
Nomenclature, 1992). Traditional enzyme discovery programs rely solely on cultured 
microorganisms for their screening programs and are thus only accessing a small 

30 fraction of natural diversity. Several recent studies have estimated that only a small 
percentage, conservatively less than 1%, of organisms present in the natural 
environment have been cultured (see Table I, Amann et al., 1995, Barns et. al 1994, 
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Torvsik, 1990). For example, Norman Pace's laboratory recently reported intensive 
untapped diversity in water and sediment samples from the "Obsidian Pool" in 
Yellowstone National Park, a spring which has been studied since the early 1960's by 
microbiologists (Barns, 1994). Amplification and cloning of 16S rRNA encoding 

5 sequences revealed mostly unique sequences with little or no representation of the 
organisms which had previously been cultured from this pool. This demonstrates 
substantial diversity of archaea with so far unknown morphological, physiological and 
biochemical features which may be useful in industrial processes. David Ward's 
laboratory in Bozmen, Montana has performed similar studies on the cyanobacterial 

1 0 mat of Octopus Spring in Yellowstone Park and came to the same conclusion, namely, 
tremendous uncultured diversity exists (Bateson et al., 1989). Giovannoni et al. 
(1990) reported similar results using bacterioplankton collected in the Sargasso Sea 
while Torsvik et al. (1990) have shown by DNA reassociation kinetics that there is 
considerable diversity in soil samples. Hence, this vast majority of microorganisms 

1 5 represent an untapped resource for the discovery of novel biocatalysts. In order to 
access this potential catalytic diversity, recombinant screening approaches are 
required. 

Bacteria and many eukaryotes have a coordinated mechanism for 
regulating genes whose products are involved in related processes. The genes are 

20 clustered, in structures referred to as "gene clusters," on a single chromosome and are 
transcribed together under the control of a single regulatory sequence, including a 
single promoter which initiates transcription of the entire cluster. The gene cluster, 
the promoter, and additional sequences that function in regulation altogether are 
referred to as an "operon" and can include up to 30 or more genes, usually from 2 to 6 

25 genes. Thus, a gene cluster is a group of adjacent genes that are either identical or 
related, usually as to their function. 

Some gene families consist of one or more identical members. 
Clustering is a prerequisite for maintaining identity between genes, although clustered 
genes are not necessarily identical. Gene clusters range from extremes where a 

30 duplication is generated of adjacent related genes to cases where hundreds of identical 
genes lie in a tandem array. Sometimes no significance is discernable in a repetition 
of a particular gene. A principal example of this is the expressed duplicate insulin 
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genes in some species, whereas a single insulin gene is adequate in other mammalian 
species. 

It is important to further research gene clusters and the extent to which 
the full length of the cluster is necessary for the expression of the proteins resulting 

5 therefrom. Gene clusters undergo continual reorganization and, thus, the ability to 
create heterogeneous libraries of gene clusters from, for example, bacterial or other 
prokaryote sources is valuable in determining sources of novel proteins, particularly 
including enzymes such as, for example, the polyketide synthases that are responsible 
for the synthesis of polyketides having a vast array of useful activities. As indicated, 

10 other types of proteins and molecules that are the product(s) of gene clusters are also 
contemplated, including, for example, antibiotics, antivirals, antitumor agents and 
regulatory proteins, such as insulin. 

Polyketides are molecules which are an extremely rich source of 
bioactivities, including antibiotics (such as tetracyclines and erythromycin), anti- 

15 cancer agents (daunomycin), immunosuppressants (FK506 and rapamycin), and 
veterinary products (monensin). Many polyketides (produced by polyketide 
synthases) are valuable as therapeutic agents. Polyketide synthases are 
multifunctional enzymes that catalyze the biosynthesis of a huge variety of carbon 
chains differing in length and patterns of functionality and cyclization. Polyketide 

20 synthase genes fall into gene clusters and at least one type (designated type I) of 
polyketide synthases have large size genes and encoded enzymes, complicating 
genetic manipulation and in vitro studies of these genes/proteins. The method(s) of 
the present invention facilitate the rapid discovery of these gene clusters in gene 
expression libraries. 

25 Gene libraries of microorganisms have been prepared for the purpose 

of identifying genes involved in biosynthetic pathways that produce medicinally- 
active metabolites and specialty chemicals. These pathways require multiple proteins 
(specifically, enzymes), entailing greater complexity than the single proteins used as 
drug targets. For example, genes encoding pathways of bacterial polyketide synthases 

30 (PKSs) were identified by screening gene libraries of the organism (Malpartida et al. 
1984, Nature 309:462; Donadio et al. 1991, Science 252:675-679). PKSs catalyze 
multiple steps of the biosynthesis of polyketides, an important class of therapeutic 
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compounds, and control the structural diversity of the polyketides produced. A host- 
vector system in Streptomyces has been developed that allows directed mutation and 
expression of cloned PKS genes (McDaniel et al. 1993, Science 262:1546-1550; Kao 
et al. 1994, Science 265:509-512). This specific host-vector system has been used to 

5 develop more efficient ways of producing polyketides, and to rationally develop novel 
polyketides (Khosla et al., WO 95/08548). 

Another example is the production of the textile dye, indigo, by 
fermentation in an E. coli host. Two operons containing the genes that encode the 
multienzyme biosynthetic pathway have been genetically manipulated to improve 

10 production of indigo by the foreign E. coli host (see, e.g., Ensley et al. 1983, Science 
222:167-169; Murdocket al. 1993, Bio/Technology 11:381-386). Overall, 
conventional studies of heterologous expression of genes encoding a metabolic 
pathway involve directed cloning, sequence analysis, designed mutations, and 
rearrangement of specific genes that encode proteins known to be involved in 

1 5 previously characterized metabolic pathways. 

In view of numerous advances in the understanding of disease 
mechanisms and identification of drug targets, there is an increasing need for 
innovative strategies and methods for rapidly identifying lead compounds and 
channeling them toward clinical testing. The methods of the present invention 

20 facilitate the rapid discovery of genes, gene pathways and gene clusters, particularly 
polyketide synthase genes, polyketide synthase gene pathways and polyketides, from 
gene expression libraries. 

Of particular interest are cellular "switches" known as receptors which 
interact with a variety of biomolecules, such as hormones, growth factors, and 

25 neurotransmitters, to mediate the transduction of an "external" cellular signaling event 
into an "internal" cellular signal. External signaling events include the binding of a 
ligand to the receptor, and internal events include the modulation of a pathway in the 
cytoplasm or nucleus involved in the growth, metabolism or apoptosis of the cell. 
Internal events also include the inhibition or activation of transcription of certain 

30 nucleic acid sequences, resulting in the increase or decrease in the production or 

presence of certain molecules (such as nucleic acid, proteins, and/or other molecules 
affected by this increase or decrease in transcription). Drugs to cure disease or 
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alleviate its symptoms can activate or block any of these events to achieve a desired 
pharmaceutical effect. 

Transduction can be accomplished by a transducing protein in the cell 
membrane which is activated upon an allosteric change the receptor may undergo 
5 upon binding to a specific biomolecule. The "active" transducing protein activates 
production of so-called "second messenger" molecules within the cell, which then 
activate certain regulatory proteins within the cell that regulate gene expression or 
alter some metabolic process. Variations on the theme of this "cascade" of events 
occur. For example, a receptor may act as its own transducing protein, or a 
10 transducing protein may act directly on an intracellular target without mediation by a 
second messenger. 

Signal transduction is a fundamental area of inquiry in biology. For 
instance, ligand/receptor interactions and the receptor/effector coupling mediated by 
Guanine nucleotide-binding proteins (G-proteins) are of interest in the study of 
1 5 disease. A large number of G protein-linked receptors funnel extracellular signals as 
diverse as hormones, growth factors, neurotransmitters, primary sensory stimuli, and 
other signals through a set of G proteins to a small number of second-messenger 
systems. The G proteins act as molecular switches with an "on" and "off' state 
governed by a GTPase cycle. Mutations in G proteins may result in either 
20 constitutive activation or loss of expression mutations. 

Many receptors convey messages through heterotrimeric G proteins, of 
which at least 17 distinct forms have been isolated. Additionally, there are several 
different G protein-dependent effectors. The signals transduced through the 
heterotrimeric G proteins in mammalian cells influence intracellular events through 
25 the action of effector molecules. 

Given the variety of functions subserved by G protein-coupled signal 
transduction, it is not surprising that abnormalities in G protein-coupled pathways can 
lead to diseases with manifestations as dissimilar as blindness, hormone resistance, 
precocious puberty and neoplasia. G-protein-coupled receptors are extremely 
30 important to drug research efforts. It is estimated that up to 60% of today's 

prescription drugs work by somehow interacting with G protein-coupled receptors. 
However, these drugs were developed using classical medicinal chemistry and 
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without a knowledge of the molecular mechanism of action. A more efficient drug 
discovery program could be deployed by targeting individual receptors and making 
use of information on gene sequence and biological function to develop effective 
therapeutics. 

5 Several groups have reported cells which express mammalian G 

proteins or subunits thereof, along with mammalian receptors which interact with 
these molecules. For example, WO92/05244 (April 2, 1992) describes a transformed 
yeast cell which is incapable of producing a yeast G protein subunit, but which has 
been engineered to produce both a mammalian G protein subunit and a mammalian 

10 receptor which interacts with the subunit. The authors found that a modified version 
of a specific mammalian receptor integrated into the membrane of the cell, as shown 
by studies of the ability of isolated membranes to interact properly with various 
known agonists and antagonists of the receptor. Ligand binding resulted in G protein- 
mediated signal transduction. 

1 5 Another group has described the functional expression of a mammalian 

adenylyl cyclase in yeast, and the use of the engineered yeast cells in identifying 
potential inhibitors or activators of the mammalian adenylyl cyclase (WO 95/30012). 
Adenylyl cyclase is among the best studied of the effector molecules which function 
in mammalian cells in response to activated G proteins. "Activators" of adenylyl 

20 cyclase cause the enzyme to become more active, elevating the cAMP signal of the 
yeast cell to a detectable degree. "Inhibitors" cause the cyclase to become less active, 
reducing the cAMP signal to a detectable degree. The method describes the use of the 
engineered yeast cells to screen for drugs which activate or inhibit adenylyl cyclase 
by their action on G protein-coupled receptors. 

25 When attempting to identify genes encoding bioactivities of interest 

from complex mixed population nucleic acid libraries, the rate limiting steps in 
discovery occur at the both DNA cloning level and at the screening level. Screening 
of complex mixed population libraries which contain, for example, 100s of different 
organisms requires the analysis of several million clones to cover this genomic 

30 diversity. An extremely high-throughput screening method has been developed to 
handle the enormous numbers of clones present in these libraries. 
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In traditional flow cytometry, it is common to analyze very large 
numbers of eukaryotic cells in a short period of time. Newly developed flow 
cytometers can analyze and sort up to 20,000 cells per second. In a typical flow 
cytometer, individual particles pass through an illumination zone and appropriate 

5 detectors, gated electronically, measure the magnitude of a pulse representing the 
extent of light scattered. The magnitude of these pulses are sorted electronically into 
"bins" or "channels", permitting the display of histograms of the number of cells 
possessing a certain quantitative property versus the channel number (Davey and Kell, 
1996). It was recognized early on that the data accruing from flow cytometric 

10 measurements could be analyzed (electronically) rapidly enough that electronic cell- 
sorting procedures could be used to sort cells with desired properties into separate 
"buckets", a procedure usually known as fluorescence-activated cell sorting (Davey 
and Kell, 1996). 

Fluorescence-activated cell sorting has been primarily used in studies 
15 of human and animal cell lines and the control of cell culture processes. Fluorophore 
labeling of cells and measurement of the fluorescence can give quantitative data about 
specific target molecules or subcellular components and their distribution in the cell 
population. Flow cytometry can quantitate virtually any cell-associated property or 
cell organelle for which there is a fluorescent probe (or natural fluorescence). The 
20 parameters which can be measured have previously been of particular interest in 
animal cell culture. 

Flow cytometry has also been used in cloning and selection of variants 
from existing cell clones. This selection, however, has required stains that diffuse 
through cells passively, rapidly and irreversibly, with no toxic effects or other 
25 influences on metabolic or physiological processes. Since, typically, flow sorting has 
been used to study animal cell culture performance, physiological state of cells, and 
the cell cycle, one goal of cell sorting has been to keep the cells viable during and 
after sorting. 

There currently are no reports in the literature of screening and 
30 discovery of recombinant enzymes in E. coli expression libraries by fluorescence 
activated cell sorting of single cells. Furthermore there are no reports of recovering 
DNA encoding bioactivities screened by expression screening in E. coli using a FACS 



9 



09010-400001 (DIVER 1280-36) 

machine. The present invention provides these methods to allow the extremely rapid 
screening of viable or non-viable cells to recover desirable activities and the nucleic 
acid encoding those activities. 

A limited number of papers describing various applications of flow 

5 cytometry in the field of microbiology and sorting of fluorescence activated 

microorganisms have, however, been published (Davey and Kell, 1996). Fluorescence 
and other forms of staining have been employed for microbial discrimination and 
identification, and in the analysis of the interaction of drugs and antibiotics with 
microbial cells. Flow cytometry has been used in aquatic biology, where 

10 autofluorescence of photosynthetic pigments are used in the identification of algae or 
DNA stains are used to quantify and count marine populations (Davey and Kell, 
1996). Thus, Diaper and Edwards used flow cytometry to detect viable bacteria after 
staining with a range of fluorogenic esters including fluorescein diacetate (FDA) 
derivatives and CemChrome B, a proprietary stain sold commercially for the detection 

1 5 of viable bacteria in suspension (Diaper and Edwards, 1994). Labeled antibodies and 
oligonucleotide probes have also been used for these purposes. 

Papers have also been published describing the application of flow 
cytometry to the detection of native and recombinant enzymatic activities in 
eukaryotes. Betz et al. studied native (non-recombinant) lipase production by the 

20 eukaryote, Rhizopus arrhizus with flow cytometry. They found that spore suspensions 
of the mold were heterogeneous as judged by light-scattering data obtained with 
excitation at 633 nm, and they sorted clones of the subpopulations into the wells of 
microtiter plates. After germination and growth, lipase production was automatically 
assayed (turbidimetrically) in the microtiter plates, and a representative set of the 

25 most active were reisolated, cultured, and assayed conventionally (Betz et al., 1984). 

Scrienc et al. have reported a flow cytometric method for detecting 
cloned -galactosidase activity in the eukaryotic organism, S. cerevisiae. The ability of 
flow cytometry to make measurements on single cells means that individual cells with 
high levels of expression (e.g., due to gene amplification or higher plasmid copy 

30 number) could be detected. In the method reported, a non-fluorescent compound P- 
naphthol-p-galactopyranoside) is cleaved by p-galactosidase and the liberated 
naphthol is trapped to form an insoluble fluorescent product. The insolubility of the 

10 



09010-400001 (DIVER 1280-36) 

fluorescent product is of great importance here to prevent its diffusion from the cell. 
Such diffusion would not only lead to an underestimation of p-galactosidase activity 
in highly active cells but could also lead to an overestimation of enzyme activity in 
inactive cells or those with low activity, as they may take up the leaked fluorescent 

5 compound, thus reducing the apparent heterogeneity of the population. 

One group has described the use of a FACS machine in an assay 
detecting fusion proteins expressed from a specialized transducing bacteriophage in 
the prokaryote Bacillus subtilis (see, e.g., Chung, et.al., J. of Bacteriology, Apr. 1994, 
p. 1977-1984; Chung, etal., Biotechnology and Bioengineering, Vol. 47, pp. 234-242 

10 (1995)). This group monitored the expression of a lacZ gene (encodes beta- 

galactosidase) fused to the sporulation loci in subtilis (spo). The technique used to 
monitor beta-galactosidase expression from spo-lacZ fusions in single cells involved 
taking samples from a sporulating culture, staining them with a commercially 
available fluorogenic substrate for beta-galactosidase called C8-FDG, and 

15 quantitatively analyzing fluorescence in single cells by flow cytometry. In this study, 
the flow cytometer was used as a detector to screen for the presence of the spo gene 
during the development of the cells. The device was not used to screen and recover 
positive cells from a gene expression library or nucleic acid for the purpose of 
discovery. 

20 Another group has utilized flow cytometry to distinguish between the 

developmental stages of the delta-proteobacteria Myxococcus xanthus (F. Russo- 
Marie, et.al., PNAS, Vol. 90, pp.8194-8198, September 1993). As in the previously 
described study, this study employed the capabilities of the FACS machine to detect 
and distinguish genotypically identical cells in different development regulatory 

25 states. The screening of an enzymatic activity was used in this study as an indirect 
measure of developmental changes. 

The lacZ gene from E. coli is often used as a reporter gene in studies of 
gene expression regulation, such as those to determine promoter efficiency, the effects 
of trans-acting factors, and the effects of other regulatory elements in bacterial, yeast, 

30 and animal cells. Using a chromogenic substrate, such as ONPG (o-nitrophenyl-(-D- 
galactopyranoside), one can measure expression of -galactosidase in cell cultures; but 
it is not possible to monitor expression in individual cells and to analyze the 

11 
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heterogeneity of expression in cell populations. The use of fluorogenic substrates, 
however, makes it possible to determine p-galactosidase activity in a large number of 
individual cells by means of flow cytometry. This type of determination can be more 
informative with regard to the physiology of the cells, since gene expression can be 
5 correlated with the stage in the mitotic cycle or the viability under certain conditions. 
In 1994, Plovins et al., reported the use of fluorescein-Di-P-D-galactopyranoside 
(FDG) and C12-FDG as substrates for P-galactosidase detection in animal, bacterial, 
and yeast cells. This study compared the two molecules as substrates for P- 
galactosidase, and concluded that FDG is a better substrate for p-galactosidase 
10 detection by flow cytometry in bacterial cells. The screening performed in this study 
was for the comparison of the two substrates. The detection capabilities of a FACS 
machine were employed to perform the study on viable bacterial cells. 

Cells with chromogenic or fluorogenic substrates yield colored and 
fluorescent products, respectively. Previously, it had been thought that the flow 
1 5 cytometry-fluorescence activated cell sorter approaches could be of benefit only for 
the analysis of cells that contain intracellularly, or are normally physically associated 
with, the enzymatic activity of small molecule of interest. On this basis, one could 
only use fluorogenic reagents which could penetrate the cell and which are thus 
potentially cytotoxic. To avoid clumping of heterogeneous cells, it is desirable in 
20 flow cytometry to analyze only individual cells, and this could limit the sensitivity 

and therefore the concentration of target molecules that can be sensed. Weaver and his 
colleagues at MIT and others have developed the use of gel microdroplets containing 
(physically) single cells which can take up nutrients, secret products, and grow to 
form colonies. The diffusional properties of gel microdroplets may be made such that 
25 sufficient extracellular product remains associated with each individual gel 

microdroplet, so as to permit flow cytometric analysis and cell sorting on the basis of 
concentration of secreted molecule within each microdroplet. Beads have also been 
used to isolate mutants growing at different rates, and to analyze antibody secretion 
by hybridoma cells and the nutrient sensitivity of hybridoma cells. The gel 
30 microdroplet method has also been applied to the rapid analysis of mycobacterial 
growth and its inhibition by antibiotics. 
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The gel microdroplet technology has had significance in amplifying 
the signals available in flow cytometric analysis, and in permitting the screening of 
microbial strains in strain improvement programs for biotechnology. Wittrup et al., 
(Biotechnolo.Bioeng. (1993) 42:351-356) developed a microencapsulation selection 

5 method which allows the rapid and quantitative screening of >10 6 yeast cells for 
enhanced secretion of Aspergillus awamori glucoamylase. The method provides a 
400-fold single-pass enrichment for high-secretion mutants. 

Gel microdroplet or other related technologies can be used in the 
present invention to localize as well as amplify signals in the high throughput 

10 screening of recombinant libraries. Cell viability during the screening is not an issue 
or concern since nucleic acid can be recovered from the microdroplet. 

Different types of encapsulation strategies and compounds or polymers 
can be used with the present invention. For instance, high temperature agaroses can 
be employed for making microdroplets stable at high temperatures, allowing stable 

1 5 encapsulation of cells subsequent to heat kill steps utilized to remove all background 
activities when screening for thermostable bioactivities. 

There are several hurdles which must be overcome when attempting to 
detect and sort E. coli expressing recombinant enzymes, and recover encoding nucleic 
acids. FACS systems have typically been based on eukaryotic separations and have 

20 not been refined to accurately sort single E. coli cells; the low forward and sideward 
scatter of small particles like E. coli, reduces the ability of accurate sorting; enzyme 
substrates typically used in automated screening approaches, such as umbelifferyl 
based substrates, diffuse out of E. coli at rates which interfere with quantitation. 
Further, recovery of very small amounts of DNA from sorted organisms can be 

25 problematic. 

There has been a dramatic increase in the need for bioactive 
compounds with novel activities. This demand has arisen largely from changes in 
worldwide demographics coupled with the clear and increasing trend in the number of 
pathogenic organisms that are resistant to currently available antibiotics as well as the 
30 need for new industrial processes for synthesis of compounds. For example, while 
there has been a surge in demand for antibacterial drugs in emerging nations with 
young populations, countries with aging populations, such as the U.S., require a 
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growing repertoire of drugs against cancer, diabetes, arthritis and other debilitating 
conditions. The death rate from infectious diseases has increased 58% between 1980 
and 1992 and it has been estimated that the emergence of antibiotic resistant microbes 
has added in excess of $30 billion annually to the cost of health care in the U.S. alone. 

5 (see, e.g., Adams et al., Chemical and Engineering News, 1995; Amann et al., 
Microbiological Reviews, 59, 1995). As a response to this trend, pharmaceutical 
companies have significantly increased their screening of microbial diversity for 
compounds with unique activities or specificities. 

The majority of bioactive compounds currently in use are derived from 

10 soil microorganisms. Many microbes inhabiting soils and other complex ecological 
communities produce a variety of compounds that increase their ability to survive and 
proliferate. These compounds are generally thought to be nonessential for growth of 
the organism and are synthesized with the aid of genes involved in intermediary 
metabolism. Such secondary metabolites that influence the growth or survival of 

15 other organisms are known as "bioactive" compounds and serve as key components of 
the chemical defense arsenal of both micro- and macroorganisms. Humans have 
exploited these compounds for use as antibiotics, antiinfectives and other bioactive 
compounds with activity against a broad range of prokaryotic and eukaryotic 
pathogens (Barnes et al., Proc.Nat. Acad. Sci. U.S.A., 91, 1994). 

20 The approach currently used to screen microbes for new bioactive 

compounds has been largely unchanged since the inception of the field. New isolates 
of bacteria, particularly gram positive strains from soil environments, are collected 
and their metabolites tested for pharmacological activity. 

There is still tremendous biodiversity that remains untapped as the 

25 source of lead compounds. However, the currently available methods for screening 
and producing lead compounds cannot be applied efficiently to these under-explored 
resources. For instance, it is estimated that at least 99% of marine bacteria species do 
not survive on laboratory media, and commercially available fermentation equipment 
is not optimal for use in the conditions under which these species will grow, hence 

30 these organisms are difficult or impossible to culture for screening or re-supply. 
Recollection, growth, strain improvement, media improvement and scale-up 
production of the drug-producing organisms often pose problems for synthesis and 
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development of lead compounds. Furthermore, the need for the interaction of specific 
organisms to synthesize some compounds makes their use in discovery extremely 
difficult. New methods to harness the genetic resources and chemical diversity of 
these untapped sources of compounds for use in drug discovery are very valuable. 

5 A central core of modern biology is that genetic information resides in a 

nucleic acid genome, and that the information embodied in such a genome (i.e., the 
genotype) directs cell function. This occurs through the expression of various genes in 
the genome of an organism and regulation of the expression of such genes. The 
expression of genes in a cell or organism defines the cell or organism's physical 

1 0 characteristics (i.e., its phenotype). This is accomplished through the translation of 
genes into proteins. Determining the biological activity of a protein obtained from an 
environmental sample can provide valuable information about the role of proteins in the 
environments. In addition, such information can help in the development of biologies, 
diagnostics, therapeutics, and compositions for industrial applications. 

15 In the United States, cancer is the second leading cause of disease- 

related deaths, second only to cardiovascular disease and it is projected to become the 
leading cause of death within a few years. The most common curative therapies for 
cancers found at an early stage include surgery and radiation (1). These methods are 
not nearly as successful in the more advanced stages of cancer. Current 

20 chemotherapeutic agents have been useful but are limited in their effectiveness. 
Significant results are obtained with chemotherapy in a small range of cancers 
including childhood cancers and certain adult malignancies such as lymphoma and 
leukemia (2). Despite these positive results, most chemotherapeutic treatments are 
not curative and serve primarily as palliatives (1). Thus, it is clear that current 

25 medical science still has a long way to go before providing long-term survival to 
patients and curability of most cancers. However, basic research over the past 20 
years has provided a vast amount of scientific information defining key players in the 
progression of cancers. Understanding the disease processes at the molecular level 
provides the means to determine optimal molecular targets and presumably selectively 

30 kill cancerous tissues. Some of the key areas that have been identified in the 

progression of tumors include proliferative signal transduction, aberrant cell-cycle 
regulation, apoptosis, telomere biology, genetic instability and angiogenesis (3). This 
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basic research is now beginning to pay off as progress towards more effective 
treatments is beginning to emerge (4,5). New chemotherapeutic agents directed 
against these identified areas are in Phase I-III clinical trials with some of the most 
promising agents active against tyrosine kinases involved in signal transduction. 

5 Small molecule inhibitors of Bcr-abl, protein kinase C, VEGF receptors, and EGF 
receptors, to name a few, are all in clinical trials (4). Some specific examples include 
the EGF receptor inhibitors, ZD1839 and CP358774, which are in Phase II trials and 
appear to be well tolerated by patients with positive signs of clinical activity (6). 
Even with this progress, the complexities of tumorigenesis necessitate not only the 

10 ongoing discovery and development of novel therapeutic agents but also the basic 

research to elucidate the underlying mechanisms of the disease. Presently, there are at 
least 50 known cancer related targets and it has been speculated that there may be up 
to several hundred new targets discovered (2). To make use of this influx of 
information, novel methods for the ultra high throughput screening of potential anti- 

1 5 cancer drugs must be developed. 

Recent technological developments in molecular biology, automation, 
miniaturization, and information technology have facilitated the high throughput 
screening of novel compounds from a variety of sources. However, despite the 
increased throughput, there is some disappointment in the industry regarding the 

20 number of novel drugs that have resulted from these efforts (7). One of the significant 
challenges is to find sufficient numbers of compounds with the structural diversity 
necessary to increase the chances of finding activity at the molecular target. 
Currently, screened compounds come from chemical and combinatorial libraries, 
historical compound collections and natural product libraries (8). Of these, one of the 

25 richest sources of drugs has been from natural product libraries. Cragg et al (9) 
reported that over 60% of the approved anticancer drugs and pre-NDA candidates 
between 1984 and 1995 were from natural sources or derived from natural products. 
In fact, it is estimated that 39% of all 520 new approved drugs during this time period 
were from or derived from natural products with 80% of anti-infectives coming from 

30 nature. Typically, natural products are small molecules that have a much greater 

structural diversity than most combinatorial approaches. Small molecules in general 
are favored by the pharmaceutical industry because they are more "drug-like" in 
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nature with the ability to penetrate tumors, be absorbed, and metabolized easily. 
However, natural products have their disadvantages, largely due to the reproducibility 
of the source, the labor-intensive extraction process, the abundance of the supply, and 
the concerns over rights to biodiversity (8). 

5 The therapeutic agents from natural sources have been primarily of 

plant and microbial origins. Of these, the greatest biodiversity exists in the 
microorganisms that populate virtually every corner of the earth. The approach 
currently used to screen microbes for new bioactive compounds has changed little 
over the last 50 years. Microbiologists collect samples from the environment, isolate 

10 a pure culture, grow up sufficient material, extract the culture, and test their 

metabolites for pharmacological activity. Variations of these natural products can 
then be generated through mutagenesis of the producing organism or through 
chemical or biochemical modification of the original backbone molecules. Natural 
products are typically made by multi-enzyme systems in which each enzyme carries 

15 out one of the many transformations required to make the final small molecule 

products, an example being antibiotics. These bioactive molecules are derived from 
the organism's ability to produce secondary metabolites in response to the specific 
needs and challenges of their local environments. The genes encoding these enzymes 
are often clustered into so-called "biosynthetic operons" which contain the blueprint 

20 for building a natural product (10). This blueprint for production of a small bioactive 
molecule is typically more than 25,000 nucleotides and can be greater than 100,000 
nucleotides. There are many examples of entire pathways encoding for the 
production of such small molecules as oxytetracycline, jadomycin, daunorubicin, to 
name just a few, that have been cloned as contiguous pieces of DNA from a 

25 producing organism (11). Some of these pathways (e.g. actinorhodin, 

tetracenomycin, puromycin, nikkomycin) have been transferred to other microbial 
hosts and the small molecule heterologously expressed (11). 

A more recent approach has been to use recombinant techniques to 
synthesize hybrid antibiotic pathways by combining gene subunits from previously 

30 characterized pathways. This approach, called "combinatorial biosynthesis" has been 
focused primarily on the polyketide antibiotics and has resulted in a number of 
compounds which have displayed activity (12, 13). In one such approach using the 
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erythronolide biosynthetic operon, enzymatic domains have been added to (14) and 
repositioned within the operon (15), thereby reprogramming polyketide biosynthesis. 
However, compounds with novel antibiotic activities have not yet been reported: an 
observation that may be due to the fact that the pathway subunits are derived from 

5 those encoding previously characterized compounds. What has not been accounted 
for in previous attempts to discover novel bioactive compounds is the relatively recent 
observation that only a small fraction of microbes in natural environments can be 
grown under laboratory conditions. Estimates are that far less than 1% of all 
prokaryotes are capable of being grown in pure culture in the laboratory. This implies 

1 0 a need for culture-independent methods for bioactive compound discovery. 

Culture-independent approaches to directly clone genes encoding both 
target enzymes and other bioactive molecules from environmental samples are based 
on the construction of libraries which represent the collective genomes of naturally 
occurring organisms, archived in cloning vectors that can be propagated in E. coli, 

1 5 Streptomyces, or other suitable hosts. Because the cloned DNA is initially extracted 
directly from environmental samples containing a mixed population of organisms, the 
representation of the libraries is not limited to the small fraction of prokaryotes that 
can be grown in pure culture, nor is it biased towards a few rapidly growing species. 
Samples can be obtained from virtually all ecosystems represented on earth, including 

20 such extreme environments as geothermal and hydrothermal vents, acidic soils and 
boiling mud pots, contaminated industrial sites, marine symbionts, etc. 

Screening of complex mixed population libraries containing, for 
example, 100 different organisms requires the analysis of tens of millions of clones to 
cover the genomic diversity. An extremely high throughput screening method must 

25 be implemented to handle the enormous numbers of clones present in these libraries. 
In the pharmaceutical industry today, high throughput screening typically has 
throughput rates on the order of 10,000 compounds per assay per day with some 
laboratories working at 100,000 assays per day. Most of the development in the 
industry has centered around the miniaturization and automation of these screens to 

30 higher density, smaller volume plate formats. However, this strategy could be 

reaching the practical limits of conventional liquid-dispensing technology and current 
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microplate fabrication processes, as well as the limits in controlling evaporation in 

open systems with very small well volumes. 

Current platforms for screening micro-scale particles of interest 

include plates that are formed with small wells, or through-holes. The wells or 
5 through-holes are used to hold a sample to be analyzed. The sample typically 

contains the particles of interest. When wells are used, complex and inefficient 

sample delivery and extraction systems must be used in order to deposit the sample 

into the wells on the plate, and remove the sample from the wells for further analysis. 

Wells-based platforms have a bottom, for which gravity is primarily used for 
10 suspending the sample on the plate to develop the particulate or incubate cells of 

interest. 

Another type of platform uses through-holes, which are typically 
machined into a plate by one of a number of well-known methods. Through-holes 
rely on capillary forces for introducing the sample to the plate, and utilize surface 

15 tension for suspending the sample in the through-holes. However, typical through- 
hole-based devices are limited to relatively small aspect ratios, or the ratio of length to 
internal diameter of the hole. A small aspect ratio yields greater evaporative loss of a 
liquid contained in the hole, and such evaporation is difficult to control. Through- 
holes are also limited in their functionality. For example, the process of forming 

20 through-holes in a plate usually does not allow for the use of various materials to line 
the inside of the holes, or to clad the outside of the holes. 

Fluorescence and other forms of staining have been employed for 
microbial discrimination and identification, and in the analysis of the interaction of 
drugs and antibiotics with microbial cells. Flow cytometry has been used in aquatic 

25 biology, where autofluorescence of photosynthetic pigments are used in the 
identification of algae or DNA stains are used to quantify and count marine 
populations (Davey and Kell, 1996). Diaper and Edwards used flow cytometry to 
detect viable bacteria after staining with a range of fluorogenic esters including 
fluorescein diacetate (FDA) derivatives and CemChrome B, a stain sold commercially 

30 for the detection of viable bacteria in suspension (Diaper and Edwards, 1994). 

Labeled antibodies and oligonucleotide probes can also been used for these purposes. 
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Papers have been published describing the application of flow 
cytometry to the detection of native and recombinant enzymatic activities in 
eukaryotes. Betz et aL studied native (non-recombinant) lipase production by the 
eukaryote, Rhizopus arrhizus with flow cytometry. They found that spore suspensions 
5 of the mold were heterogeneous as judged by light-scattering data obtained with 
excitation at 633 nm, and they sorted clones of the subpopulations into the wells of 
microtiter plates. After germination and growth, lipase production was automatically 
assayed (turbidimetrically) in the microtiter plates, and a representative set of the 
most active were reisolated, cultured, and assayed conventionally (Betz et al., 1984). 

10 The ability of flow cytometry to make measurements on single cells means that 
individual cells with high levels of expression (e.g., due to gene amplification or 
higher plasmid copy number) could be detected. 

Cells with chromogenic or fluorogenic substrates yield colored and 
fluorescent products, respectively. Previously, it had been thought that the flow 

15 cytometry-fluorescence activated cell sorter approaches could be of benefit only for 
the analysis of cells that contain intracellularly, or are normally physically associated 
with, the enzymatic activity of a molecule of interest. On this basis, one could only 
use fluorogenic reagents which could penetrate the cell and which are thus potentially 
cytotoxic. In addition, gel microdroplets (GMDs) can be used during FACS sorting 

20 and culturing. The use of GMDs containing (physically) single cells which can take 
up nutrients, secrete products, and grow to form colonies is useful in the present 
invention. The diffusional properties of GMDs may be made such that sufficient 
extracellular product remains associated with each individual GMD, so as to permit 
flow cytometric analysis and cell sorting on the basis of concentration of secreted 

25 molecule within each microdroplet. Beads have also been used to isolate mutants 
growing at different rates, and to analyze antibody secretion by hybridoma cells and 
the nutrient sensitivity of hybridoma cells. 

The gel microdroplet (GMD) technology has had significance in 
amplifying the signals available in flow cytometric analysis, and in permitting the 

30 screening and sorting of microbial strains in strain improvement and isolation 

programs. GMD or other related technologies can be used in the present invention to 
localize, sort as well as amplify signals in the high throughput screening of 
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recombinant libraries. Cell viability during the screening is not an issue or concern 
since nucleic acid can be recovered from the microdroplet. 

There is currently a need in the biotechnology and chemical industry for 
molecules that can optimally carry out biological or chemical processes (e.g., enzymes). 

5 Identifying novel enzymes in a mixed population environmental sample is one solution 
to this problem. By rapidly identifying polypeptides having an activity of interest and 
polynucleotides encoding the polypeptide of interest the invention provides methods, 
compositions and sources for the development of biologies, diagnostics, therapeutics, 
and compositions for industrial applications. 

1 0 All classes of molecules and compounds that are utilized in both established 

and emerging chemical, pharmaceutical, textile, food and feed, detergent markets must 
meet economical and environmental standards. The synthesis of polymers, 
pharmaceuticals, natural products and agrochemicals is often hampered by expensive 
processes which produce harmful byproducts and which suffer from poor or inefficient 

1 5 catalysis. Enzymes, for example, have a number of remarkable advantages which can 
overcome these problems in catalysis: they act on single functional groups, they 
distinguish between similar functional groups on a single molecule, and they distinguish 
between enantiomers. Moreover, they are biodegradable and function at very low mole 
fractions in reaction mixtures. Because of their chemo-, regio- and stereospecificity, 

20 enzymes present a unique opportunity to optimally achieve desired selective 

transformations. These are often extremely difficult to duplicate chemically, especially 
in single-step reactions. The elimination of the need for protection groups, selectivity, 
the ability to carry out multi-step transformations in a single reaction vessel, along with 
the concomitant reduction in environmental burden, has led to the increased demand for 

25 enzymes in chemical and pharmaceutical industries. Enzyme-based processes have been 
gradually replacing many conventional chemical-based methods. A current limitation to 
more widespread industrial use is primarily due to the relatively small number of 
commercially available enzymes. Only -300 enzymes (excluding DNA modifying 
enzymes) are at present commercially available from the > 3000 non DNA-modifying 

30 enzyme activities thus far described. 

The use of enzymes for technological applications also may require 
performance under demanding industrial conditions. This includes activities in 
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environments or on substrates for which the currently known arsenal of enzymes was not 
evolutionarily selected. However, the natural environment provides extreme conditions 
including, for example, extremes in temperature and pH. A number of organisms have 
adapted to these conditions due in part to selection for polypeptides than can withstand 
5 these extremes. 

Enzymes have evolved by selective pressure to perform very specific 
biological functions within the milieu of a living organism, under conditions of 
temperature, pH and salt concentration. For the most part, the non-DNA modifying 
enzyme activities thus far described have been isolated from mesophilic organisms, 

1 0 which represent a very small fraction of the available phylogenetic diversity. The 
dynamic field of biocatalysis takes on a new dimension with the help of enzymes 
isolated from microorganisms that thrive in extreme environments. For example, such 
enzymes must function at temperatures above 100°C in terrestrial hot springs and deep 
sea thermal vents, at temperatures below 0°C in arctic waters, in the saturated salt 

1 5 environment of the Dead Sea, at pH values around 0 in coal deposits and geothermal 
sulfur-rich springs, or at pH values greater than 1 1 in sewage sludge. Environmental 
samples obtained, for example, from extreme conditions containing organisms, 
polynucleotides or polypeptides (e.g., enzymes) open a new field in biocatalysis. 

In addition to the need for new enzymes for industrial use, there has been a 

20 dramatic increase in the need for bioactive compounds with novel activities. This 

demand has arisen largely from changes in worldwide demographics coupled with the 
clear and increasing trend in the number of pathogenic organisms that are resistant to 
currently available antibiotics. For example, while there has been a surge in demand for 
antibacterial drugs in emerging nations with young populations, countries with aging 

25 populations, such as the U.S., require a growing repertoire of drugs against cancer, 
diabetes, arthritis and other debilitating conditions. The death rate from infectious 
diseases has increased 58% between 1980 and 1992 and it has been estimated that the 
emergence of antibiotic resistant microbes has added in excess of $30 billion annually to 
the cost of health care in the U.S. alone. (Adams et al., Chemical and Engineering 

30 News, 1 995; Amann et al., Microbiological Reviews, 59, 1995). As a response to this 
trend pharmaceutical companies have significantly increased their screening of microbial 
diversity for compounds with unique activities or specificity. 
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The majority of bioactive compounds currently in use are derived from 
soil microorganisms. Many microbes inhabiting soils and other complex ecological 
communities produce a variety of compounds that increase their ability to survive and 
proliferate. These compounds are generally thought to be nonessential for growth of the 

5 organism and are synthesized with the aid of genes involved in intermediary metabolism 
hence their name - "secondary metabolites". Secondary metabolites are generally the 
products of complex biosynthetic pathways and are usually derived from common 
cellular precursors. Secondary metabolites that influence the growth or survival of other 
organisms are known as "bioactive" compounds and serve as key components of the 

1 0 chemical defense arsenal of both micro- and macro-organisms. Humans have exploited 
these compounds for use as antibiotics, antiinfectives and other bioactive compounds 
with activity against a broad range of prokaiyotic and eukaryotic pathogens. 
Approximately 6,000 bioactive compounds of microbial origin have been characterized, 
with more than 60% produced by the gram positive soil bacteria of the genus 

15 Streptomyces. (Barnes et al., Proc.Nat. Acad. Sci. U.S.A., 91, 1994). Of these, at least 
70 are currently used for biomedical and agricultural applications. The largest class of 
bioactive compounds, the polyketides, include a broad range of antibiotics, 
immunosuppressants and anticancer agents which together account for sales of over $5 
billion per year. 

20 Despite the seemingly large number of available bioactive compounds, it 

is clear that one of the greatest challenges facing modern biomedical science is the 
proliferation of antibiotic resistant pathogens. Because of their short generation time and 
ability to readily exchange genetic information, pathogenic microbes have rapidly 
evolved and disseminated resistance mechanisms against virtually all classes of 

25 antibiotic compounds. For example, there are virulent strains of the human pathogens 
Staphylococcus and Streptococcus that can now be treated with but a single antibiotic, 
vancomycin, and resistance to this compound will require only the transfer of a single 
gene, vanA, from resistant Enterococcus species for this to occur. (Bateson et al., 
System. Appl. Microbiol, 12, 1989). When this crucial need for novel antibacterial 

30 compounds is superimposed on the growing demand for enzyme inhibitors, 
immunosuppressants and anti-cancer agents it becomes readily apparent why 
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pharmaceutical companies have stepped up their screening of microbial samples for 
bioactive compounds. 

Conventional screening methods include liquid phase, microtiter plate 
based assays. The format for liquid phase assays is often robotically manipulated 96, 

5 384, or 1 536-well microtiter plates. Although these microtiter plate based screening 
technologies are being used successfully, limitations do exist. The primary limitation is 
throughput as these techniques generally allow the screening of only about 10 5 to 10 6 
clones/day/instrument. For example, a typical screen of 100,000 wells on a microtiter 
based HTS systems requires 261 ,384-well microtiter plates and over 24 hours of 

1 0 equipment time. However, while 1 536-well or greater plate formats are growing in 

popularity, the majority of companies involved in HTS continue to use 384-well plates, 
as this technology is reliable and standardized. While these throughputs may be more 
than sufficient for screening isolate and low-complexity libraries, it could take more than 
a year to thoroughly screen one complex gene library. Clearly, higher throughput 

1 5 screening technology is necessary. 

Other screening methods include growth selection (Snustad et al., 1988; 
Lundberg et al., 1993; Yano et al., 1998), colorimetric screening of bacterial colonies or 
phage plaques (Kuritz, 1999), in vitro expression cloning (King et al., 1997) and cell 
surface or phage display (Benhar, 2001). Each of these systems has limitations. Solid 

20 phase colorimetric plate screening of colonies or plaques is limited by relatively low 
throughput. Even with the use of microcolonies/plaques and automated imaging and 
clone recovery, thorough screening of complex libraries is impractical. Cell surface 
and/or phage display technologies suffer from structural limitations of the displayed 
molecule. Often the size and /or shape of the displayed molecule is restricted by the 

25 display technology. One of the highest throughput screening methods, growth selection, 
is also limited in its scope of usefulness. Assay conditions, temperature and pH, are 
limited by the growth parameters of the host strain. Molecular interactions are often 
constrained by the host cell membranes and/or cell wall, as substrate must be presented 
to intracellular enzymes. In addition, "false positives" or a high level of "background" 

30 are a common occurrence in many selection assays. With respect to screening for 

improved variants in GSSM™ or GeneReassembly libraries, growth selection is seldom 
quantitative. 
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Classification of microorganisms based on rRNA analysis has shown that 
the majority of microbes present in nature have no counterpart among previously 
cultured organisms. Establishing the metabolic properties and potential of this microbial 
diversity in the absence of pure culture presents an immense challenge for microbial 
5 ecologists. Although 16S rRNA studies combined with genomic analyses of naturally 
occurring marine bacterioplankton has suggested the existence of novel metabolic 
functions, a comprehensive understanding of the physiology of these organisms, and of 
the complex environmental processes in which they engage, will undoubtedly require 
their cultivation. 

1 o Conventional cultivation of microorganisms is laborious, time consuming 

and, most importantly, selective and biased for the growth of specific microorganisms. 
The majority of cells obtained from nature and visualized by microscopy are viable, but 
they do not generally form visible colonies on plates. This may reflect the artificial 
conditions inherent most culture media, for example extremely high substrate 

1 5 concentrations, or the lack of specific nutrients required for growth. Consistent with this, 
it was shown recently that certain previously uncultivable microorganisms could be 
grown in pure culture if provided with the chemical components of their natural 
environment. 

SUMMARY OF THE INVENTION 
20 The present invention comprises methods for high throughput 

screening for biomolecules of interest. In one aspect, the invention provides methods 
for isolating and maintaining a cell from a mixed population of uncultivated cells 
comprising: (a) encapsulating in a microenvironment at least a single cell from the 
mixed population; (b) placing the encapsulated cell in a growth column; and (c) 
25 incubating the encapsulated cell in the growth column under conditions allowing the 
encapsulated cell to survive and be maintained, thereby isolating and maintaining the 
cell. In one aspect, the mixed population of uncultivated cells comprises an 
environmental sample, such as a sample from, or derived from, geothermal fields, 
hydrothermal fields, acidic soils, sulfotara mud pots, boiling mud pots, pools, hot- 
30 springs, geysers, marine actinomycetes, metazoan, endosymionts, ectosymbionts, 

tropical soil, temperate soil, arid soil, compost piles, manure piles, marine sediments, 
freshwater sediments, water concentrates, hypersaline sea ice, super-cooled sea ice, 
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arctic tundra, Sargosso sea, open ocean pelagic, marine snow, microbial mats, whale 
falls, springs, hydrothermal vents, insect and nematode gut microbial communities, 
plant endophytes, epiphytic water samples, industrial sites and/or ex situ enrichments. 
In one aspect, the environmental sample is a eukaryote, prokaryote, myxobacteria 
5 (epothilone), and/or isolated from or derived from air, water, sediment, soil and/or 
rock. 

In one aspect, the mixed population of uncultivated cells comprises a 
mixture of materials. The mixture of materials can comprise a biological sample, soil 
or sludge. In one aspect, the biological sample comprises a plant sample, a food 
10 sample, a gut sample, a salivary sample, a blood sample, a sweat sample, a urine 
sample, a spinal fluid sample, a tissue sample, a vaginal swab, a stool sample, an 
amniotic fluid sample and/or a buccal mouthwash sample. 

In one aspect, a cell from a mixed population of uncultivated cells can 
comprise a microorganism, such as a bacterial cell, a yeast cell, an archaeal cell, a 
1 5 plant cell, a mammalian cell, an insect cell or a protozoan cell, or, a virus or a phage. 
The cell can comprise an extremophile, such as hyperthermophiles, psychrophiles, 
halophiles, psychrotrophs, alkalophiles, acidophiles and the like. 

In one aspect, the cells are encapsulated in a gel microdroplet (GMD), 
e.g., a porous gel microdroplet (GMD), a liposome, a ghost cell, or any equivalent. 
20 The porous gel microdroplet (GMD) can comprise a hydrogel matrix, or equivalent, 
or a selectively permeable membrane. In one aspect, the porous gel microdroplet 
(GMD) comprises a CELMIX™ emulsion matrix, or equivalent or a CELGEL™ 
encapsulation matrix, or equivalent. 

In one aspect, one cell is encapsulated in each gel microdroplet 
25 (GMD), or, one to four cells can be encapsulated in each gel microdroplet (GMD). 

In one aspect, the growth column comprises a capillary, such as a 
capillary array, e.g., a GIGAMATRIX™ (Diversa Corporation, San Diego, CA). The 
growth column can comprise a chromatography column, or equivalent. 

In one aspect, conditions allowing the encapsulated cell to survive and 
30 be maintained comprise providing nutrients at in situ concentrations. The conditions 
allowing the encapsulated cell to survive and be maintained can comprise flowing an 
aqueous nutrient mixture through the growth column. 
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In one aspect, the method further comprises incubating and culturing 
the encapsulated cell in the growth column under conditions allowing growth or 
proliferation of the cells into a microcolony comprising at least two daughter cells. 
The microcolony can comprise between about 2, 3, 4, 5, 6, 7, 8, 9, 10 and about 100, 
5 200, 300 or more cells. 

In one aspect, the method further comprises isolating a gel 
microdroplet. The method can comprise isolating a microcolony from the gel 
microdroplet. The method can comprise isolating a cell from the microcolony. In 
one aspect, the isolating of a gel microdroplet can comprise sorting an encapsulated 
10 microcolony by size, e.g., by using flow cytometry. In one aspect, the gel 
microdroplet is isolated by FACS. 

In one aspect, the method further comprises maintaining the isolated 
cell by re-encapsulating and re-culturing the isolated cell. In one aspect, between 
about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 and 100 or more 
1 5 cells are maintained in each re-encapsulated microcolony. 

In one aspect, the method further comprises screening the interactions 
between encapsulated cells. In one aspect, the method further comprises re-culturing 
the isolated gel microdroplet under the same or different conditions. In one aspect, 
the method further comprises direct amplification of nucleic acid from the 
20 encapsulated cell. In one aspect, the method further comprises direct amplification of 
nucleic acid from the cultivated encapsulated cells. 

The invention also provides methods for identifying a polynucleotide 
encoding an activity of interest comprising encapsulating in a microenvironment at 
least a single cell from the mixed population; placing the encapsulated cell in a 
25 growth column; incubating the encapsulated cell in the growth column under 

conditions allowing the encapsulated cell to survive and be maintained, contacting a 
nucleic acid isolated or derived from the encapsulated cell with at least one nucleic 
acid probe comprising a detectable label, wherein the nucleic acid probe is capable of 
specifically hybridizing to a polynucleotide encoding an activity of interest; and, 
30 detecting a specific hybridization between a nucleic acid isolated or derived from the 
encapsulated cell and the nucleic acid probe, thereby identifying a polynucleotide 
encoding an activity of interest. In one aspect, the method further comprises 
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enriching for a polynucleotide encoding an activity of interest by isolating or 
amplifying the nucleic acid identified by the specific hybridization between the 
nucleic acid isolated or derived from the encapsulated cell and the nucleic acid probe. 

In one aspect, nucleic acids or nucleic acid libraries derived from mixed 

5 populations of nucleic acids and/or organisms are screened very rapidly for bioactivities 
of interest utilizing liquid phase screening methods. These libraries can represent the 
genomes of multiple organisms, species or subspecies. In one aspect, the libraries are 
screened via hybridization methods, such as "biopanning", or by activity based screening 
methods. High throughput screening can be performed by utilizing single cell screening 

10 systems, such as fluorescence activated cell sorting (FACS) or by capillary array-based 
systems. 

The invention provides novel bioactive molecules other than enzymes. 
In one aspect, antibiotics, antivirals, antitumor agents and regulatory proteins are 
discovered utilizing the methods of the present invention. 

1 5 The present invention provides methods and compositions to access this 

untapped biodiversity and to rapidly screen for polynucleotides, proteins and small 
molecules of interest utilizing high throughput screening of multiple samples. These 
biomolecules can be derived from cultured or uncultured samples of organisms. In one 
aspect, the methods of the present invention provide a method for high throughput 

20 cultivation of unculturable microorganisms. 

In one aspect, the present invention provides methods to study molecules 
which affect the interaction of ligands with receptors, e.g., G proteins with receptors. 

In one aspect, the present invention provides a process for identifying 
clones having a specified activity of interest, which process comprises (i) generating one 

25 or more gene libraries derived from nucleic acid isolated from a mixed population of 
organisms; and (ii) screening said libraries utilizing a high throughput cell analyzer, e.g., 
a fluorescence activated cell sorter or a non-optical cell sorter, to identify said clones. 

The invention provides a process for identifying clones having a 
specified activity of interest by (i) generating one or more libraries, e.g., expression 

30 libraries, made to contain nucleic acid directly or indirectly isolated from a mixed 
population of organisms ; (ii) exposing said libraries to a particular substrate or 
substrates of interest; and (iii) screening said exposed libraries utilizing a high 
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throughput cell analyzer, e.g., a fluorescence activated cell sorter or a non-optical cell 
sorter, to identify clones which react with the substrate or substrates. 

In another aspect, the invention also provides a process for identifying 
clones having a specified activity of interest by (i) generating one or more gene 

5 libraries derived from nucleic acid directly or indirectly isolated from a mixed 
population of organisms; and (ii) screening said exposed libraries utilizing an assay 
requiring a binding event or the covalent modification of a target, and a high 
throughput cell analyzer, e.g., a fluorescence activated cell sorter or non-optical cell 
sorter, to identify positive clones. 

1 0 The invention further provides a method of screening for an agent that 

modulates the activity of a target protein or other cell component (e.g., nucleic acid), 
wherein the target and a selectable marker are expressed by a recombinant cell, by co- 
encapsulating the agent in a microenvironment with the recombinant cell expressing the 
target and detectable marker and detecting the effect of the agent on the activity of the 

1 5 target cell component. 

In another aspect, the invention provides a method for enriching for 
target DNA sequences containing at least a partial coding region for at least one 
specified activity in a DNA sample by co-encapsulating a mixture of target DNA 
obtained from a mixture of organisms with a mixture of DNA probes including a 

20 detectable marker and at least a portion of a DNA sequence encoding at least one 

enzyme having a specified enzyme activity and a detectable marker; incubating the co- 
encapsulated mixture under such conditions and for such time as to allow hybridization 
of complementary sequences and screening for the target DNA. Optionally the method 
further comprises transforming host cells with recovered target DNA to produce an 

25 expression library of a plurality of clones. 

The invention further provides a method of screening for an agent that 
modulates the interaction of a first test protein linked to a DNA binding moiety and a 
second test protein linked to a transcriptional activation moiety by co-encapsulating the 
agent with the first test protein and second test protein in a suitable microenvironment 

30 and determining the ability of the agent to modulate the interaction of the first test 

protein linked to a DNA binding moiety with the second test protein covalently linked to 
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a transcriptional activation moiety, wherein the agent enhances or inhibits the expression 
of a detectable protein. 

In yet another aspect, the present invention provides a method for 
identifying a polynucleotide in a liquid phase, including contacting a plurality of 

5 polynucleotides derived from at least one organism, e.g., a mixed population of 
organisms, including microorganisms or plant tissue, with at least one nucleic acid 
probe under conditions that allow hybridization of the probe to the polynucleotides 
having complementary sequences, wherein the probe is labeled with a detectable 
molecule (e.g., a fluorescent, magnetic or other molecule). The detectable molecule 

1 0 changes, e.g., fluoresces, upon interaction of the probe to a target polynucleotide in the 
library. Clones from the library are then separated with an analyzer that detects the 
change in the detectable molecule, e.g., fluorescence, magnetic field or dielectric 
signature. The detectable molecule may also be a bioluminescent molecule, a 
chemiluminescent molecule, a colorimetric molecule, an electromagnetic molecule, 

15 an isotopic molecule, a thermal molecule or an enzymatic substrate. The separated 
clones can be contacted with a reporter system that identifies a polynucleotide encoding 
a polypeptide or a small molecule of interest, for example, and the clones capable of 
modulating expression or activity of the reporter system identified thereby identifying a 
polynucleotide of interest. The liquid phase of the aspect includes in a solution (cell- 

20 free), in a cell, or in a non-solid phase. 

In another aspect, the invention provides a method for identifying a 
polynucleotide encoding a polypeptide of interest. The method includes co- 
encapsulating in a microenvironment a plurality of library clones containing DNA 
obtained from a mixed population of organisms with a mixture of oligonucleotide probes 

25 comprising a detectable marker and at least a portion of a polynucleotide sequence 
encoding a polypeptide of interest having a specified bioactivity. The encapsulated 
clones are incubated under such conditions and for such time as to allow interaction of 
complementary sequences and clones containing a complement to the oligonucleotide 
probe encoding the polypeptide of interest identified by separating clones with a 

30 fluorescent analyzer or non-optical analyzer that detects the detectable marker. 

In yet another aspect, the invention provides a method for high 
throughput screening of a polynucleotide library for a polynucleotide of interest that 
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encodes a molecule of interest. The method includes contacting a library containing a 
plurality of clones comprising polynucleotides derived from a mixed population of 
organisms with a plurality of oligonucleotide probes labeled with a detectable molecule 
wherein said detectable molecule becomes detectable upon interaction of the probe to a 

5 target polynucleotide in the library; separating clones with an analyzer that detects the 
detectable marker; contacting the separated clones with a reporter system that identifies a 
polynucleotide encoding the molecule of interest; and identifying clones capable of 
modulating expression or activity of the reporter system thereby identifying a 
polynucleotide of interest. 

10 In another aspect, the invention provides a method of screening for a 

polynucleotide encoding an activity of interest. The method includes (a) obtaining 
polynucleotides from a sample containing a mixed population of organisms; (b) 
normalizing the polynucleotides obtained from the sample; (c) generating a library from 
the normalized polynucleotides; (d) contacting the library with a plurality of 

1 5 oligonucleotide probes comprising a detectable marker and at least a portion of a 

polynucleotide sequence encoding a polypeptide of interest having a specified activity to 
select library clones positive for a sequence of interest; (e) selecting clones with an 
analyzer (e.g. a fluorescent or non-optical analyzer) that detects the marker; (f) 
contacting the selected clones with a reporter system that identifies a polynucleotide 

20 encoding the activity of interest; and (g) identifying clones capable of modulating 
expression or activity of the reporter system thereby identifying a polynucleotide of 
interest; wherein the positive clones contain a polynucleotide sequence encoding an 
activity of interest which is capable of catalyzing the bioactive substrate. 

In yet another aspect, the present invention provides a method for 

25 screening polynucleotides, comprising contacting a library of polynucleotides derived 
from a mixed population of organism with a probe oligonucleotide labeled with a 
detectable molecule, which is detectable upon binding of the probe to a target 
polynucleotide of the library, to select library polynucleotides positive for a sequence of 
interest; separating library members that are positive for the sequence of interest with an 

30 analyzer that detects the molecule; expressing the selected polynucleotides to obtain 
polypeptides; contacting the polypeptides with a reporter system; and identifying 
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polynucleotides encoding polypeptides capable of modulating expression or activity of 
the reporter system. 

In another aspect, the invention provides a method for obtaining an 
organism from a mixed population of organisms in a sample. The method includes 

5 encapsulating in a microenvironment at least one organism from the sample; incubating 
the encapsulated organism under such conditions and for such a time to allow the at least 
one microorganism to grow or proliferate; and sorting the encapsulated organism by 
flow cytometry to obtain an organism from the sample. 

In another aspect, the invention provides a method for identifying a 

10 polynucleotide in a liquid phase comprising: a) contacting a plurality of 

polynucleotides derived from at least one organism with at least one nucleic acid 
probe under conditions that allow hybridization of the probe to the polynucleotides 
having complementary sequences, wherein the probe is labeled with a detectable 
molecule; and b) identifying a polynucleotide of interest with an analyzer that detects 

1 5 the detectable molecule . 

In one aspect, the methods use a sample screening apparatus including 
a plurality of capillaries formed into an array of adjacent capillaries, wherein each 
capillary comprises at least one wall defining a lumen for retaining a sample. The 
apparatus further includes interstitial material disposed between adjacent capillaries in 

20 the array, and one or more reference indicia formed within of the interstitial material. 

In one aspect, the methods use a capillary for screening a sample, 
wherein the capillary is adapted for being bound in an array of capillaries, includes a 
first wall defining a lumen for retaining the sample, and a second wall formed of a 
filtering material, for filtering excitation energy provided to the lumen to excite the 

25 sample. 

According to yet another aspect of the invention, a method for 
incubating a bioactivity or biomolecule of interest includes the steps of introducing a 
first component into at least a portion of a capillary of a capillary array, wherein each 
capillary of the capillary array comprises at least one wall defining a lumen for 
30 retaining the first component, and introducing an air bubble into the capillary behind 
the first component. The method further includes the step of introducing a second 
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component into the capillary, wherein the second component is separated from the 
first component by the air bubble. 

In one aspect, the invention provides a method of incubating a sample 
of interest that includes introducing a first liquid labeled with a detectable particle into 

5 a capillary of a capillary array, wherein each capillary of the capillary array comprises 
at least one wall defining a lumen for retaining the first liquid and the detectable 
particle, and wherein the at least one wall is coated with a binding material for 
binding the detectable particle to the at least one wall. The method further includes 
removing the first liquid from the capillary tube, wherein the bound detectable 

10 particle is maintained within the capillary, and introducing a second liquid into the 
capillary tube. 

Another aspect of the invention includes a recovery apparatus for a 
sample screening system, wherein the system includes a plurality of capillaries 
formed into an array. The recovery apparatus includes a recovery tool adapted to 

1 5 contact at least one capillary of the capillary array and recover a sample from the at 
least one capillary. The recovery apparatus further includes an ejector, connected 
with the recovery tool, for ejecting the recovered sample from the recovery tool. 

The invention provides a universal and novel method that provides 
access to this immense reservoir of untapped microbial diversity. This technique 

20 combines compartmentalized microcolonies with flow cytometry for massively 
parallel microbial cultivation. The invention provides the ability to grow and study 
these organisms in pure culture. It revolutionizes our understanding of microbial 
physiology and metabolic adaptation and provides new sources of novel microbial 
metabolites. The invention can be applied to samples from several different 

25 environments, including seawater, sediments, and soil. 

The details of one or more embodiments of the invention are set forth 
in the accompanying drawings and the description below. Other features, objects, and 
advantages of the invention will be apparent from the description and drawings, and 
30 from the claims. 

All publications mentioned herein are incorporated herein by reference 
in full for the purpose of describing and disclosing the databases, proteins, and 
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methodologies, which are described in the publications which might be used in 
connection with the presently described invention. The publications discussed above 
and throughout the text are provided solely for their disclosure prior to the filing date 
of the present application. Nothing herein is to be construed as an admission that the 
inventors are not entitled to antedate such disclosure by virtue of prior invention. 

All publications, patents, patent applications, GenBank sequences and 
ATCC deposits, cited herein are hereby expressly incorporated by reference for all 
purposes. 

BRIEF DESCRIPTION OF THE FIGURES 

The following drawings are illustrative of embodiments of the invention 
and are not meant to limit the scope of the invention as encompassed by the claims. 

Figure 1 illustrates the protocol used in the cell sorting method of the 
invention to screen for a polynucleotide of interest, in this case using a (library excised 
into E. coli). The clones of interest are isolated by sorting. 

Figure 2 shows a microtiter plate where clones or cells are sorted in 
accordance with the invention. Typically one cell or cells grown within a microdroplet 
are dispersed per well and grown up as clones. 

Figure 3 depicts a co-encapsulation assay. Cells containing library 
clones are co-encapsulated with a substrate or labeled oligonucleotide. Encapsulation 
can occur in a variety of means, including GMDs, liposomes, and ghost cells. Cells are 
screened via high throughput screening on a fluorescence analyzer. 

Figure 4 depicts a side scatter versus forward scatter graph of FACS 
sorted gel-microdroplets (GMDs) containing a species of Streptomyces which forms 
unicells. Empty gel-microdroplets are distinguished from free cells and debris, also. 

Figure 5 is a depiction of a FACS/Biopanning method described herein 
and described in Example 3, below. 

Figure 6 A shows an example of dimensions of a capillary array of the 
invention. Figure 6B illustrates an array of capillary arrays. 

Figure 7 shows a top cross-sectional view of a capillary array. 

Figure 8 is a schematic depicting the excitation of and emission from a 
sample within the capillary lumen according to one aspect of the invention. 
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Figure 9 is a schematic depicting the filtering of excitation and 
emission light to and from a sample within the capillary lumen according to an 
alternative aspect of the invention. 

Figure 10 illustrates an aspect of the invention in which a capillary 
5 array is wicked by contacting a sample containing cells, and humidified in a 

humidified incubator followed by imaging and recovery of cells in the capillary array. 

Figure 1 1 illustrates a method for incubating a sample in a capillary 
tube by an evaporative and capillary wicking cycle. 

Figure 12A shows a portion of a surface of a capillary array on which 
10 condensation has formed. Figure 12B shows the portion of the surface of the capillary 
array, depicted in Figure 12 A, in which the surface is coated with a hydrophobic layer 
to inhibit condensation near an end of individual capillaries. 

Figures 13 A, 13B and 13C depict a method of retaining at least two 
components within a capillary. 
1 5 Figure 14A depicts capillary tubes containing paramagnetic beads and 

cells. Figure 14B depicts the use of the paramagnetic beads to stir a sample in a 
capillary tube. 

Figure 15 depicts an excitation apparatus for a detection system 
according to an aspect of the invention. 

20 Figure 16 illustrates a system for screening samples using a capillary 

array according to an aspect of the invention. 

Figure 17A illustrates one example of a recovery technique useful for 
recovering a sample from a capillary array. In this depiction a needle is contacted 
with a capillary containing a sample to be obtained. A vacuum is created to evacuate 

25 the sample from the capillary tube and onto a filter. Figure 17B illustrates one sample 
recovery method in which the recovery device has an outer diameter greater than the 
inner diameter of the capillary from which a sample is being recovered. Figure 17C 
illustrates another sample recovery method in which the recovery device has an outer 
diameter approximately equal to or less than the inner diameter of the capillary. 

30 Figure 17D shows the further processing of the sample once evacuated from the 
capillary. 
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Figure 18 is a schematic showing high throughput enrichment of low 
copy gene targets. 

Figure 19 is a schematic of FACS-Biopanning using high throughput 
culturing. Polyketide synthase sequences from environmental samples are shown in 
5 the alignment. 

Figure 20 shows whole cell hybridization for biopanning. 
Figure 21 is a schematic showing co-encapsulation of a eukaryotic cell 
and a bacterial cell. 

Figure 22 illustrates a whole cell hybridization schematic for 
1 0 biopanning and F AC S sorting. 

Figure 23 shows a schematic of T7 RNA Polymerase Expression 

system. 

Figure 24 is a schematic summarizing an exemplary protocol to 
determine the optimal growth medium for a broad diversity of organisms, as 
1 5 described in detail in Example 1 8, below. 

Figure 25 is an illustration of a light scattering signature of 
microcolonies as detected and separated by flow cytometry, as described in detail in 
Example 1 8, below. 

Figures 26a, 26b and 26c are schematic drawings summarizing the 
20 characterization of clones (microcolonies) from organisms found and isolated by a 
method of the invention and analyzed by 16S rRNA gene sequence analysis, as 
described in detail in Example 18, below. Figure 26d is an illustration of a picture of 
a culture designated as strain GMDJE10E6, as described in detail in Example 18, 
below. 

25 Like reference symbols in the various drawings indicate like elements. 

DETAILED DESCRIPTION OF THE INVENTION 
The invention provides a novel high throughput cultivation method based 
on the combination of a single cell encapsulation procedure with flow cytometry that 
enables cells to grow with nutrients that are present at environmental concentrations. 
30 The present invention provides a method for rapid sorting and screening 

of libraries derived from a mixed population of organisms from, for example, an 
environmental sample or an uncultivated population of organisms. In one aspect, gene 
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libraries are generated, clones are either exposed to a substrate or substrate(s) of interest, 
or hybridized to a fluorescence labeled probe having a sequence corresponding to a 
sequence of interest and positive clones are identified and isolated via fluorescence 
activated cell sorting. Cells can be viable or non-viable during the process or at the end 

5 of the process, as nucleic acids encoding a positive activity can be isolated and cloned 
utilizing techniques well known in the art. 

This invention differs from fluorescence activated cell sorting, as 
normally performed, in several aspects. Previously, FACS machines have been 
employed in studies focused on the analyses of eukaryotic and prokaryotic cell lines and 

1 0 cell culture processes. FACS has also been utilized to monitor production of foreign 
proteins in both eukaryotes and prokaryotes to study, for example, differential gene 
expression. The detection and counting capabilities of the FACS system have been 
applied in these examples. However, FACS has never previously been employed in a 
discovery process to screen for and recover bioactivities in prokaryotes. In addition, 

1 5 non-optical methods have not been used to identify or discover novel bioactivities or 

biomolecules. Furthermore, the present invention does not require cells to survive, as do 
previously described technologies, since the desired nucleic acid (recombinant clones) 
can be obtained from alive or dead cells. For example, the cells only need to be viable 
long enough to contain, carry or synthesize a complementary nucleic acid sequence to be 

20 detected, and can thereafter be either viable or non-viable cells so long as the 

complementary sequence remains intact. The present invention also solves problems 
that would have been associated with detection and sorting of E. coli expressing 
recombinant enzymes, and recovering encoding nucleic acids. The invention includes 
within its aspects apparatus capable of detecting a molecule or marker that is indicative 

25 of a bioactivity or biomolecule of interest, including optical and non-optical apparatus. 

In one aspect, the present invention includes within its aspects any 
apparatus capable of detecting fluorescent wavelengths associated with biological 
material, such apparatuses are defined herein as fluorescent analyzers (one example of 
which is a FACS apparatus). 

30 In the methods of the invention, use of a culture-independent approach to 

directly clone genes encoding novel enzymes from, for example, an environmental 
sample containing a mixed population of organisms allows one to access untapped 
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resources of biodiversity. In one aspect, the invention is based on the construction of 
"mixed population libraries" which represent the collective genomes of naturally 
occurring organisms archived in cloning vectors that can be propagated in suitable 
prokaryotic hosts. Because the cloned DNA is initially extracted directly from 
environmental samples, the libraries are not limited to the small fraction of prokaryotes 
that can be grown in pure culture. Additionally, a normalization of the DNA present in 
these samples could allow more equal representation of the DNA from all of the species 
present in the original sample. This can increase the efficiency of finding interesting 
genes from minor constituents of the sample which may be under-represented by several 
orders of magnitude compared to the dominant species. 

Prior to the present invention, the evaluation of complex mixed 
population expression libraries was rate limiting. The present invention allows the rapid 
screening of complex mixed population libraries, containing, for example, genes from 
thousands of different organisms. The benefits of the present invention can be seen, for 
example, in screening a complex mixed population sample. Screening of a complex 
sample previously required one to use labor intensive methods to screen several million 
clones to cover the genomic biodiversity. The invention represents an extremely high- 
throughput screening method which allows one to assess this enormous number of 
clones. The method disclosed herein allows the screening anywhere from about 30 
million to about 200 million clones per hour for a desired nucleic acid sequence or 
biological activity. This allows the thorough screening of mixed population libraries for 
clones expressing novel biomolecules. 

The invention provides methods and compositions whereby one can 
screen, sort or identify a polynucleotide sequence, polypeptide, or molecule of interest 
from a mixed population of organisms (e.g., organisms present in a mixed population 
sample) based on polynucleotide sequences present in the sample. Thus, the 
invention provides methods and compositions useful in screening organisms for a 
desired biological activity or biological sequence and to assist in obtaining sequences 
of interest that can further be used in directed evolution, molecular biology, 
biotechnology and industrial applications. By screening and identifying the nucleic 
acid sequences present in the sample, the invention increases the repertoire of 
available sequences that can be used for the development of diagnostics, therapeutics 
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or molecules for industrial applications. Accordingly, the methods of the invention 
can identify novel nucleic acid sequences encoding proteins or polypeptides having a 
desired biological activity. 

In one aspect, the invention provides a method for high throughput 

5 culturing of organisms. In one aspect, the organisms are a mixed population of 

organisms. In another aspect, the organisms include host cells of a library containing 
nucleic acids. For example, such libraries include nucleic acid obtained from various 
isolates of organisms, which are then pooled; nucleic acid obtained from isolate 
libraries, which are then pooled; or nucleic acids derived directly from a mixed 

10 population of organisms. Generally, a sample containing the organisms is mixed with 
a composition that can form a microenvironment, as described herein, e.g., a gel 
microdroplet or a liposome. In one aspect, as illustrated in Example 8 a mixed 
population of microorganisms is mixed with the encapsulation material in such a way 
that preferably fewer than 5 microorganisms are encapsulated. Preferably, only one 

1 5 microorganism is encapsulated in each microenvironment system. 

Once encapsulated, the cells are cultured in a manner which allows 
growth of the organisms, e.g., host cells of a library. For example, Example 8 
provides growth of the encapsulated organisms in a chromatography column which 
allows a flow of growth medium providing nutrients for growth and for removal of 

20 waste products from cells. Over a period of time (20 minutes to several weeks or 
months), a clonal population of the preferably one organism grows within the 
microenvironment. 

After a desired period of time, microenvironments, e.g., gel 
microdroplets, can be sorted to eliminate "empty" microenvironments and to sort for 

25 the occupied microenvironments. The nucleic acid from organisms in the sorted 
microenvironments can be studied directly, for example, by treating with a PCR 
mixture and amplified immediately after sorting. In one Example described herein, 
16S rRNA genes from individual cells were studied and organisms assessed for 
phylogenetic diversity from the samples. 

30 In another aspect, the high throughput culturing methods of the 

invention allow culturing of organisms and enrichment of low copy gene targets. For 
example, a library of nucleic acid obtained from various isolates of organisms, which 
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are then pooled; nucleic acid obtained from isolate libraries, which are then pooled; or 
nucleic acids derived directly from a mixed population of organisms, for example, are 
encapsulated, e.g., in a gel microdroplet or other microenvironment, and grown under 
conditions which allow clonal expansion of each organism in the microenvironment. 
5 In one aspect, the cells of the clonal population are lysed and treated with proteinases 
to yield nucleic acid (see Figures) (e.g., the microcolonies are de-proteinized by 
incubating gel microdroplets in lysis solution containing proteinase K at 37 degrees C 
for 30 minutes). In order to denature and neutralize nucleic acid entrapped in the 
microenvironments, they are denatured with alkaline denaturing solution (0.5M 

10 NaOH) and neutralized (e.g., with Tris pH8). In one particular example, nucleic acid 
entrapped in the microenvironment is hybridized with Digoxiginin (DIG)-labeled 
oligonucleotides (30-50 nt) in Dig Easy Hyb (available from Roche) overnight at 37 
degrees C, followed by washing with 0.3xSSC and O.lxSSC at 38-50 degrees C to 
achieve desired stringency. One of skill in the art will appreciate that this is merely 

1 5 an example and not meant to limit the invention in any way. For example, other 
labels commonly used in the art, e.g., fluorescent labels such as GFP or 
chemiluminescent labels, can be utilized in the invention methods. 

The nucleic acid is hybridized with a probe which is preferably 
labeled. A signal can be amplified with a secondary label (e.g., fluorescent) and the 

20 nucleic acid sorted for fluorescent microenvironments, e.g., gel microdroplets. 

Nucleic acid that is fluorescent can be isolated and further studied or cloned into a 
host cell for further manipulation. In one particular example, signals are amplified 
with Tyramide Signal Amplification™ (TSA) kit from Molecular Probe. TSA is an 
enzyme-mediated signal amplification method that utilizes horseradish peroxidase 

25 (HRP) to depose fluorogenic tyramide molecules and generate high-density labeling 
of a target nucleic acid sequence in situ. The signal amplification is conferred by the 
turnover of multiple tyramide substrates per HRP molecule, and increases in signal 
strength of over 1,000-fold have been reported. The procedure involves incubating 
GMDs with anti-DIG conjugated horseradish peroxidase (anti-DIG-HRP) (Roche, IN) 

30 for 3 hours at room temperature. Then the tyramide substrate solution will be added 
and incubated for 30 minutes at room temperature (RT). 
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In one aspect, this high throughput culturing method followed by 
sorting (e.g., FACS) screening (e.g., biopanning), allows for identification of gene 
targets. It may be desirable to screen for nucleic acids encoding virtually any protein 
or any bioactivity and to compare such nucleic acids among various species of 
5 organisms in a sample (e.g., study polyketide sequences from a mixed population). In 
another aspect, nucleic acid derived from high throughput culturing of organisms can 
be obtained for further study or for generation of a library. Such nucleic acid can be 
pooled and a library created, or alternatively, individual libraries from clonal 
populations of organisms can be generated and then nucleic acid pooled from those 

10 libraries to generate a more complex library. The libraries generated as described 
herein can be utilized for the discovery of biomolecules (e.g., nucleic acid or 
bioactivities) or for evolving nucleic acid molecules identified by the high throughput 
culturing methods described in the present invention. 

Such evolution methods are known in the art or described herein, such 

15 as, shuffling, cassette mutagenesis, recursive ensemble mutagenesis, sexual PCR, 
directed evolution, exonuclease-mediated reassembly, codon site-saturation 
mutagenesis, amino acid site-saturation mutagenesis, gene site saturation 
mutagenesis, introduction of mutations by non-stochastic polynucleotide reassembly 
methods, synthetic ligation polynucleotide reassembly, gene reassembly, 

20 oligonucleotide-directed saturation mutagenesis, in vivo reassortment of 

polynucleotide sequences having partial homology, naturally occurring recombination 
processes which reduce sequence complexity, and any combination thereof. 

Flow cytometry has been used in cloning and selection of variants from 
existing cell clones. This selection, however, has required stains that diffuse through 

25 cells passively, rapidly and irreversibly, with no toxic effects or other influences on 
metabolic or physiological processes. Since, typically, flow sorting has been used to 
study animal cell culture performance, physiological state of cells, and the cell cycle, one 
goal of cell sorting has been to keep the cells viable during and after sorting. 

There currently are no reports in the literature of screening and discovery 

30 of polynucleotide sequence in libraries by cell sorting based on fluorescence (e.g. 

fluorescent activated cell sorting), or non-optical markers (e.g., magnetic fields and the 
like). Furthermore there are no reports of recovering DNA encoding bioactivities 
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screened by FACS or non-optical techniques and additionally screening for a bioactivity 
of interest. The present invention provides these methods to allow the extremely rapid 
screening of viable or non-viable cells to recover desirable activities and the nucleic acid 
encoding those activities. 
5 Different types of encapsulation (e.g., gel microdroplet) strategies and 

compounds or polymers can be used with the present invention. For instance, high 
temperature agaroses can be employed for making microdroplets stable at high 
temperatures, allowing stable encapsulation of cells subsequent to heat-kill steps utilized 
to remove all background activities when screening for thermostable bioactivities. 

10 Encapsulation can be in beads, high temperature agaroses, gel microdroplets, cells, 
such as ghost red blood cells or macrophages, liposomes, or any other means of 
encapsulating and localizing molecules. For example, methods of preparing 
liposomes have been described (i.e., U.S. Patent No.'s 5,653,996, 5393530 and 
5,651,981), as well as the use of liposomes to encapsulate a variety of molecules U.S. 

15 Patent No.'s 5,595,756, 5,605,703, 5,627,159, 5,652,225, 5,567,433, 4,235,871, 

5,227,170). Entrapment of proteins, viruses, bacteria and DNA in erythrocytes during 
endocytosis has been described, as well (Journal of Applied Biochemistry 4, 418-435 
(1982)). Erythrocytes employed as carriers in vitro or in vivo for substances 
entrapped during hypo-osmotic lysis or dielectric breakdown of the membrane have 

20 also been described (reviewed in Ihler, G. M. (1983) J. Pharm. Ther). These 

techniques are useful in the present invention to encapsulate samples for screening. 

"Microenvironment", as used herein, is any molecular structure which 
provides an appropriate environment for facilitating the interactions necessary for the 
method of the invention. An environment suitable for facilitating molecular 

25 interactions include, for example, gel microdroplets, ghost cells, macrophages or 
liposomes. 

Liposomes can be prepared from a variety of lipids including 
phospholipids, glycolipids, steroids, long-chain alkyl esters; e.g., alkyl phosphates, 
fatty acid esters; e.g., lecithin, fatty amines and the like. A mixture of fatty material 
30 may be employed such a combination of neutral steroid, a charge amphiphile and a 
phospholipid. Illustrative examples of phospholipids include lecithin, sphingomyelin 
and dipalmitoylphos-phatidylcholine. Representative steroids include cholesterol, 
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cholestanol and lanosterol. Representative charged amphiphilic compounds generally 
contain from 12-30 carbon atoms. Mono- or dialkyl phosphate esters, or alkyl 
amines; e.g., dicetyl phosphate, stearyl amine, hexadecyl amine, dilauryl phosphate, 
and the like. 

5 The invention methods include a system and method for holding and 

screening samples. According to one aspect of the invention, a sample screening 
apparatus includes a plurality of capillaries formed into an array of adjacent 
capillaries, wherein each capillary comprises at least one wall defining a lumen for 
retaining a sample. The apparatus further includes interstitial material disposed 

10 between adjacent capillaries in the array, and one or more reference indicia formed 
within of the interstitial material, (see co-pending U.S. patent applications serial nos. 
09/687,219 and 09/894,956). 

According to another aspect of the invention, a capillary for screening 
a sample, wherein the capillary is adapted for being bound in an array of capillaries, 

1 5 includes a first wall defining a lumen for retaining the sample, and a second wall 

formed of a filtering material, for filtering excitation energy provided to the lumen to 
excite the sample. 

In another aspect of the invention, a method for incubating a 
bioactivity or biomolecule of interest includes the steps of introducing a first 

20 component into at least a portion of a capillary of a capillary array, wherein each 
capillary of the capillary array comprises at least one wall defining a lumen for 
retaining the first component, and introducing an air bubble into the capillary behind 
the first component. The method further includes the step of introducing a second 
component into the capillary, wherein the second component is separated from the 

25 first component by the air bubble. 

In one aspect of the invention, a method of incubating a sample of 
interest includes introducing a first liquid labeled with a detectable particle into a 
capillary of a capillary array, wherein each capillary of the capillary array comprises 
at least one wall defining a lumen for retaining the first liquid and the detectable 

30 particle, and wherein the at least one wall is coated with a binding material for 

binding the detectable particle to the at least one wall. The method further includes 
removing the first liquid from the capillary tube, wherein the bound detectable 
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particle is maintained within the capillary, and introducing a second liquid into the 
capillary tube. 

Another aspect of the invention includes a recovery apparatus for a 
sample screening system, wherein the system includes a plurality of capillaries 
formed into an array. The recovery apparatus includes a recovery tool adapted to 
contact at least one capillary of the capillary array and recover a sample from the at 
least one capillary. The recovery apparatus further includes an ejector, connected 
with the recovery tool, for ejecting the recovered sample from the recovery tool. 
Definitions 

Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as commonly understood to one of ordinary skill in the art to 
which the invention belongs. Although any methods, devices and materials similar or 
equivalent to those described herein can be used in the practice or testing of the 
invention, the methods, devices and materials are now described. 

As used herein and in the appended claims, the singular forms "a," "and," 
and "the" include plural referents unless the context clearly dictates otherwise. Thus, for 
example, reference to "a clone" includes a plurality of clones and reference to "the 
nucleic acid sequence" generally includes reference to one or more nucleic acid 
sequences and equivalents thereof known to those skilled in the art, and so forth. 

An "amino acid" is a molecule having the structure wherein a central 
carbon atom (the (3-carbon atom) is linked to a hydrogen atom, a carboxylic acid group 
(the carbon atom of which is referred to herein as a "carboxyl carbon atom"), an amino 
group (the nitrogen atom of which is referred to herein as an "amino nitrogen atom"), 
and a side chain group, R. When incorporated into a peptide, polypeptide, or protein, an 
amino acid loses one or more atoms of its amino acid carboxylic groups in the 
dehydration reaction that links one amino acid to another. As a result, when 
incorporated into a protein, an amino acid is referred to as an "amino acid residue." 

"Protein" or "polypeptide" refers to any polymer of two or more 
individual amino acids (whether or not naturally occurring) linked via a peptide bond, 
and occurs when the carboxyl carbon atom of the carboxylic acid group bonded to the p- 
carbon of one amino acid (or amino acid residue) becomes covalently bound to the 
amino nitrogen atom of amino group bonded to the P-carbon of an adjacent amino acid. 



44 



09010-400001 (DIVER 1280-36) 

The term "protein" is understood to include the terms "polypeptide" and "peptide" 
(which, at times may be used interchangeably herein) within its meaning. In addition, 
proteins comprising multiple polypeptide subunits (e.g., DNA polymerase HI, RNA 
polymerase II) or other components (for example, an RNA molecule, as occurs in 

5 telomerase) will also be understood to be included within the meaning of "protein" as 
used herein. Similarly, fragments of proteins and polypeptides are also within the scope 
of the invention and may be referred to herein as "proteins." 

A particular amino acid sequence of a given protein (i.e., the 
polypeptide's "primary structure," when written from the amino-terminus to carboxy- 

1 0 terminus) is determined by the nucleotide sequence of the coding portion of a mRNA, 
which is in turn specified by genetic information, typically genomic DNA (including 
organelle DNA, e.g., mitochondrial or chloroplast DNA). Thus, determining the 
sequence of a gene assists in predicting the primary sequence of a corresponding 
polypeptide and more particular the role or activity of die polypeptide or proteins 

1 5 encoded by that gene or polynucleotide sequence. 

The term "isolated" means altered "by the hand of man" from its natural 
state; i.e., if it occurs in nature, it has been changed or removed from its original 
environment, or both. For example, a naturally occurring polynucleotide or a 
polypeptide naturally present in a living animal, a biological sample or an environmental 

20 sample in its natural state is not "isolated", but the same polynucleotide or polypeptide 
separated from the coexisting materials of its natural state is "isolated", as the term is 
employed herein. Such polynucleotides, when introduced into host cells in culture or in 
whole organisms, still would be isolated, as the term is used herein, because they would 
not be in their naturally occurring form or environment. Similarly, the polynucleotides 

25 and polypeptides may occur in a composition, such as a media formulation (solutions for 
introduction of polynucleotides or polypeptides, for example, into cells or compositions 
or solutions for chemical or enzymatic reactions). 

"Polynucleotide" or "nucleic acid sequence" refers to a polymeric form 
of nucleotides. In some instances a polynucleotide refers to a sequence that is not 

30 immediately contiguous with either of the coding sequences with which it is 

immediately contiguous (one on the 5' end and one on the 3' end) in the naturally 
occurring genome of the organism from which it is derived. The term therefore includes, 
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for example, a recombinant DNA which is incorporated into a vector; into an 
autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or 
eukaryote, or which exists as a separate molecule (e.g., a cDNA) independent of other 
sequences. The nucleotides of the invention can be ribonucleotides, deoxy- 

5 ribonucleotides, or modified forms of either nucleotide. A polynucleotides as used 

herein refers to, among others, single-and double-stranded DNA, DNA that is a mixture 
of single- and double-stranded regions, single- and double-stranded RNA, and RNA that 
is mixture of single- and double-stranded regions, hybrid molecules comprising DNA 
and RNA that may be single-stranded or, more typically, double-stranded or a mixture of 

1 0 single- and double-stranded regions. In addition, polynucleotide as used herein refers to 
triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands in 
such regions may be from the same molecule or from different molecules. The regions 
may include all of one or more of the molecules, but more typically involve only a 
region of some of the molecules. One of the molecules of a triple-helical region often is 

1 5 an oligonucleotide. The term polynucleotide encompasses genomic DNA or RNA 

(depending upon the organism, i.e., RNA genome of viruses), as well as mRNA encoded 
by the genomic DNA, and cDNA. 

By rapidly screening for polynucleotides encoding polypeptides of 
interest, the invention provides not only a source of materials for the development of 

20 biologies, therapeutics, and enzymes for industrial applications, but also provides a new 
materials for further processing by, for example, directed evolution and mutagenesis to 
develop molecules or polypeptides modified for particular activity or conditions. 

The invention is used to obtain and identify polynucleotides and related 
sequence specific information from, for example, infectious microorganisms present in 

25 the environment such as, for example, in the gut of various macroorganisms. 

In another aspect, the methods and compositions of the invention provide 
for the identification of lead drug compounds present in an environmental sample. The 
methods of the invention provide the ability to mine the environment for novel drugs or 
identify related drugs contained in different microorganisms. There are several common 

30 sources of lead compounds (drug candidates), including natural product collections, 
synthetic chemical collections, and synthetic combinatorial chemical libraries, such as 
nucleotides, peptides, or other polymeric molecules that have been identified or 
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developed as a result of environmental mining. Each of these sources has advantages 
and disadvantages. The success of programs to screen these candidates depends largely 
on the number of compounds entering the programs, and pharmaceutical companies 
have to date screened hundred of thousands of synthetic and natural compounds in 
5 search of lead compounds. Unfortunately, the ratio of novel to previously-discovered 
compounds has diminished with time. The discovery rate of novel lead compounds has 
not kept pace with demand despite the best efforts of pharmaceutical companies. There 
exists a strong need for accessing new sources of potential drug candidates. 
Accordingly, the invention provides a rapid and efficient method to identify and 
1 0 characterize environmental samples that may contain novel drug compounds. 

The invention provides methods of identifying a nucleic acid sequence 
encoding a polypeptide having either known or unknown function. For example, much 
of the diversity in microbial genomes results from the rearrangement of gene clusters in 
the genome of microorganisms. These gene clusters can be present across species or 
1 5 phylogenetically related with other organisms. 

For example, bacteria and many eukaryotes have a coordinated 
mechanism for regulating genes whose products are involved in related processes. The 
genes are clustered, in structures referred to as "gene clusters," on a single chromosome 
and are transcribed together under the control of a single regulatory sequence, including 
20 a single promoter which initiates transcription of the entire cluster. The gene cluster, the 
promoter, and additional sequences that function in regulation altogether are referred to 
as an "operon" and can include up to 20 or more genes, usually from 2 to 6 genes. Thus, 
a gene cluster is a group of adjacent genes that are either identical or related, usually as 
to their function. Gene clusters are generally 1 5 kb to greater than 1 20 kb in length. 
25 Some gene families consist of identical members. Clustering is a 

prerequisite for maintaining identity between genes, although clustered genes are not 
necessarily identical. Gene clusters range from extremes where a duplication is 
generated to adjacent related genes to cases where hundreds of identical genes lie in a 
tandem array. Sometimes no significance is discernable in a repetition of a particular 
30 gene. A principal example of this is the expressed duplicate insulin genes in some 
species, whereas a single insulin gene is adequate in other mammalian species. 
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Further, gene clusters undergo continual reorganization and, thus, the 
ability to create heterogeneous libraries of gene clusters from, for example, bacterial or 
other prokaryote sources is valuable in determining sources of novel proteins, 
particularly including enzymes such as, for example, the polyketide synthases that are 
5 responsible for the synthesis of polyketides having a vast array of useful activities. 
Other types of proteins that are the product(s) of gene clusters are also contemplated, 
including, for example, antibiotics, antivirals, antitumor agents and regulatory proteins, 
such as insulin. 

As an example, polyketide synthases enzymes fall in a gene cluster. 

10 Polyketides are molecules which are an extremely rich source of bioactivities, including 
antibiotics (such as tetracyclines and erythromycin), anti-cancer agents (daunomycin), 
immunosuppressants (FK506 and rapamycin), and veterinary products (monensin). 
Many polyketides (produced by polyketide synthases) are valuable as therapeutic agents. 
Polyketide synthases are multifunctional enzymes that catalyze the biosynthesis of a 

1 5 huge variety of carbon chains differing in length and patterns of functionality and 
cyclization. Polyketide synthase genes fall into gene clusters and at least one type 
(designated type I) of polyketide synthases have large size genes and enzymes, 
complicating genetic manipulation and in vitro studies of these genes/proteins. 

The ability to select and combine desired components from a library of 

20 polyketides and postpolyketide biosynthesis genes for generation of novel polyketides 
for study is appealing. The method(s) of the present invention make it possible to, and 
facilitate the cloning of, novel polyketide synthases, since one can generate gene banks 
with clones containing large inserts (especially when using the f-factor based vectors), 
which facilitates cloning of gene clusters. 

25 Other biosynthetic genes include NRPS, glycosyl transferases and p450s. 

For example, a gene cluster can be ligated into a vector containing an expression 
regulatory sequences which can control and regulate the production of a detectable 
protein or protein-related array activity from the ligated gene clusters. Use of vectors 
which have an exceptionally large capacity for exogenous nucleic acid introduction are 

30 particularly appropriate for use with such gene clusters and are described by way of 
example herein to include artificial chromosome vectors, cosmids, and the f-factor (or 
fertility factor) of E. coli. For example, the f-factor of E. coli is a plasmid which affects 
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high-frequency transfer of itself during conjugation and is ideal to achieve and stably 
propagate large nucleic acid fragments, such as gene clusters from samples of mixed 
populations of organisms. 

The nucleic acid isolated or derived from these samples (e.g., a mixed 

5 population of microorganisms) can preferably be inserted into a vector or a plasmid prior 
to screening of the polynucleotides. Such vectors or plasmids are typically those 
containing expression regulatory sequences, including promoters, enhancers and the like. 

The invention provides novel systems to clone and screen mixed 
populations of organisms present, for example, in environmental samples, for 

1 0 polynucleotides of interest, enzymatic activities and bioactivities of interest in vitro. The 
method(s) of the invention allow the cloning and discovery of novel bioactive molecules 
in vitro, and in particular novel bioactive molecules derived from uncultivated or 
cultivated samples. Large size gene clusters, genes and gene fragments can be cloned, 
sequenced and screened using the method(s) of the invention. Unlike previous 

1 5 strategies, the method(s) of the invention allow one to clone, screen and identify 

polynucleotides and the polypeptides encoded by these polynucleotides in vitro from a 
wide range of mixed population samples. 

The invention allows one to screen for and identify polynucleotide 
sequences from complex mixed population samples. DNA libraries obtained from these 

20 samples can be created from cell free samples, so long as the sample contains nucleic 
acid sequences, or from samples containing cellular organisms or viral particles. The 
organisms from which the libraries may be prepared include prokaryotic 
microorganisms, such as Eubacteria and Archaebacteria, lower eukaryotic 
microorganisms such as fungi, algae and protozoa, as well as plants, plant spores and 

25 pollen. The organisms may be cultured organisms or uncultured organisms obtained 
from mixed population environmental samples, including extremophiles, such as 
thermophiles, hyperthermophiles, psychrophiles and psychrotrophs. 

Sources of nucleic acids used to construct a DNA library can be obtained 
from mixed population samples, such as, but not limited to, microbial samples obtained 

30 from Arctic and Antarctic ice, water or permafrost sources, materials of volcanic origin, 
materials from soil or plant sources in tropical areas, droppings from various organisms 
including mammals, invertebrates, as well as dead and decaying matter etc. Thus, for 
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example, nucleic acids may be recovered from either a cultured or non-cultured 
organism and used to produce an appropriate DNA library (e.g., a recombinant 
expression library) for subsequent determination of the identity of the particular 
polynucleotide sequence or screening for bioactivity. 

5 The following outlines a general procedure for producing libraries from 

both culturable and non-culturable organisms as well as mixed population of organisms, 
which libraries can be probed, sequenced or screened to select therefrom nucleic acid 
sequences having an identified, desired or predicted biological activity (e.g., an 
enzymatic activity or a small molecule). 

10 As used herein a mixed population sample is any sample containing 

organisms or polynucleotides or a combination thereof, which can be obtained from any 
number of sources (as described above), including, for example, insect feces, soil, water, 
etc. Any source of nucleic acids in purified or non-purified form can be utilized as 
starting material. Thus, the nucleic acids may be obtained from any source which is 

1 5 contaminated by an organism or from any sample containing cells. The mixed 

population sample can be an extract from any bodily sample such as blood, urine, spinal 
fluid, tissue, vaginal swab, stool, amniotic fluid or buccal mouthwash from any 
mammalian organism. For non-mammalian (e.g., invertebrates) organisms the sample 
can be a tissue sample, salivary sample, fecal material or material in the digestive tract of 

20 the organism. An environmental sample also includes samples obtained from extreme 
environments including, for example, hot sulfur pools, volcanic vents, and frozen tundra. 
In addition, the sample can come from a variety of sources. For example, in horticulture 
and agricultural testing the sample can be a plant, fertilizer, soil, liquid or other 
horticultural or agricultural product; in food testing the sample can be fresh food or 

25 processed food (for example infant formula, seafood, fresh produce and packaged food); 
and in environmental testing the sample can be liquid, soil, sewage treatment, sludge and 
any other sample in the environment which is considered or suspected of containing an 
organism or polynucleotides. 

When the sample is a mixture of material (e.g., a mixed population of 

30 organisms), for example, blood, soil and sludge, it can be treated with an appropriate 
reagent which is effective to open the cells and expose or separate the strands of nucleic 
acids. Mixed populations can comprise pools of cultured organisms or samples. For 
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example, samples of organisms can be cultured prior to analysis in order to purify a 
particular population and thus obtaining a purer sample. Organisms, such as 
actinomycetes or myxobacteria, known to produce bioactivities of interest can be 
enriched for, via culturing. Culturing of organisms in the sample can include culturing 

5 the organisms in microdroplets and separating the cultured microdroplets with a cell 
sorter into individual wells of a multi-well tissue culture plate from which further 
processing may be performed. 

The sample can comprise nucleic acids from, for example, a diverse and 
mixed population of organisms (e.g., microorganisms present in the gut of an insect). 

1 0 Nucleic acids are isolated from the sample using any number of methods for DN A and 
RNA isolation. Such nucleic acid isolation methods are commonly performed in the art. 
Where the nucleic acid is RNA, the RNA can be reversed transcribed to DNA using 
primers known in the art. Where the DNA is genomic DNA, the DNA can be sheared 
using, for example, a 25 gauge needle. 

1 5 The nucleic acids can be cloned into a vector. Cloning techniques are 

known in the art or can be developed by one skilled in the art, without undue 
experimentation. Vectors used in the present invention include: plasmids, phages, 
cosmids, phagemids, viruses (e.g., retroviruses, parainfluenzavirus, herpesviruses, 
reoviruses, paramyxoviruses, and the like), artificial chromosomes, or selected portions 

20 thereof (e.g., coat protein, spike glycoprotein, capsid protein). For example, cosmids and 
phagemids are typically used where the specific nucleic acid sequence to be analyzed or 
modified is large because these vectors are able to stably propagate large 
polynucleotides. 

The vector containing the cloned DNA sequence can then be amplified 
25 by plating (i.e., clonal amplification) or transfecting a suitable host cell with the vector 
(e.g., a phage on an E. coli host). Alternatively (or subsequently to amplification), the 
cloned DNA sequence is used to prepare a library for screening by transforming a 
suitable organism. Hosts, known in the art are transformed by artificial introduction of 
the vectors containing the target nucleic acid by inoculation under conditions conducive 
30 for such transformation. One could transform with double stranded circular or linear 
nucleic acid or there may also be instances where one would transform with single 
stranded circular or linear nucleic acid sequences. By transform or transformation is 
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meant a permanent or transient genetic change induced in a cell following incorporation 
of new DNA (i.e., DNA exogenous to the cell). Where the cell is a mammalian cell, a 
permanent genetic change is generally achieved by introduction of the DNA into the 
genome of the cell. A transformed cell or host cell generally refers to a cell (e.g., 
5 prokaryotic or eukaryotic) into which (or into an ancestor of which) has been introduced, 
by means of recombinant DNA techniques, a DNA molecule not normally present in the 
host organism. 

A particularly type of vector for use in the invention contains an f-factor 
origin replication. The f-factor (or fertility factor) in E. coli is a plasmid which effects 

10 high frequency transfer of itself during conjugation and less frequent transfer of the 
bacterial chromosome itself. In a particular aspect cloning vectors referred to as 
"fosmids" or bacterial artificial chromosome (B AC) vectors are used. These are derived 
from E. coli f-factor which is able to stably integrate large segments of DNA. When 
integrated with DNA from a mixed uncultured mixed population sample, this makes it 

1 5 possible to achieve large genomic fragments in the form of a stable "mixed population 
nucleic acid library." 

The nucleic acids derived from a mixed population or sample may be 
inserted into the vector by a variety of procedures. In general, the nucleic acid sequence 
is inserted into an appropriate restriction endonuclease site(s) by procedures known in 

20 the art. Such procedures and others are deemed to be within the scope of those skilled in 
the art. A typical cloning scenario may have the DNA "blunted" with an appropriate 
nuclease (e.g., Mung Bean Nuclease), methylated with, for example, EcoR I Methylase 
and ligated to EcoR I linkers. The linkers are then digested with an EcoR I Restriction 
Endonuclease and the DNA size fractionated (e.g., using a sucrose gradient). The 

25 resulting size fractionated DNA is then ligated into a suitable vector for sequencing, 
screening or expression (e.g., a lambda vector and packaged using an in vitro lambda 
packaging extract). 

Transformation of a host cell with recombinant DNA may be carried out 
by conventional techniques as are well known to those skilled in the art. Where the host 

30 is prokaryotic, such as E. coli, competent cells which are capable of DNA uptake can be 
prepared from cells harvested after exponential growth phase and subsequently treated 
by the CaCl 2 method by procedures well known in the art. Alternatively, MgCb or RbCl 
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can be used. Transformation can also be performed after forming a protoplast of the host 
cell or by electroporation. Transformation of Pseudomonas fluorescens and yeast host 
cells can be achieved by electroporation, using techniques described herein. 

When the host is a eukaryote, methods of transfection or transformation 
5 with DNA include conjugation, calcium phosphate co-precipitates, conventional 

mechanical procedures such as microinjection, electroporation, insertion of a plasmid 
encased in liposomes, or virus vectors, as well as others known in the art, may be used. 
Eukaryotic cells can also be cotransfected with a second foreign DNA molecule 
encoding a selectable marker, such as the herpes simplex thymidine kinase gene. 

10 Another method is to use a eukaryotic viral vector, such as simian virus 40 (SV40) or 
bovine papilloma virus, to transiently infect or transform eukaryotic cells and express the 
protein. (Eukaryotic Viral Vectors, Cold Spring Harbor Laboratory, Gluzman ed., 1982). 
The eukaryotic cell may be a yeast cell (e.g., Saccharomyces cerevisiae), an insect cell 
(e.g., Drosophila sp.) or may be a mammalian cell, including a human cell 

1 5 Eukaryotic systems, and mammalian expression systems, allow for post- 

translational modifications of expressed mammalian proteins to occur. Eukaryotic cells 
which possess the cellular machinery for processing of the primary transcript, 
glycosylation, phosphorylation, and, advantageously secretion of the gene product 
should be used. Such host cell lines may include, but are not limited to, CHO, VERO, 

20 BHK, HeLa, COS, MDCK> Jurkat, HEK-293, and WI38. 

After the gene libraries have been generated one can perform 
"biopanning" of the libraries prior to expression screening. The "biopanning" procedure 
refers to a process for identifying clones having a specified biological activity by 
screening for sequence homology in the library of clones, using at least one probe DNA 

25 comprising at least a portion of a DNA sequence encoding a polypeptide having the 
specified biological activity; and detecting interactions with the probe DNA to a 
substantially complementary sequence in a clone. Clones (either viable or non-viable) 
are then separated by an analyzer (e.g., a FACS apparatus or an apparatus that detects 
non-optical markers). 

30 The probe DNA used to probe for the target DNA of interest contained in 

clones prepared from polynucleotides in a mixed population of organisms can be a full- 
length coding region sequence or a partial coding region sequence of DNA for a known 
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bioactivity. The sequence of the probe can be generated by synthetic or recombinant 
means and can be based upon computer based sequencing programs or biological 
sequences present in a clone. The DNA library can be probed using mixtures of probes 
comprising at least a portion of the DNA sequence encoding a known bioactivity having 
5 a desired activity. These probes or probe libraries are preferably single-stranded. The 
probes that are particularly suitable are those derived from DNA encoding bioactivities 
having an activity similar or identical to the specified bioactivity which is to be screened. 

In another aspect, a nucleic acid library from a mixed population of 
organisms is screened for a sequence of interest by transfecting a host cell containing the 

10 library with at least one labeled nucleic acid sequence which is all or a portion of a DNA 
sequence encoding a bioactivity having a desirable activity and separating the library 
clones containing the desirable sequence by optical- or non-optical-based analysis. 

In another aspect, in vivo biopanning may be performed utilizing a 
FACS-based machine. Complex gene libraries are constructed with vectors which 

15 contain elements which stabilize transcribed RNA. For example, the inclusion of 

sequences which result in secondary structures such as hairpins which are designed to 
flank the transcribed regions of the RNA would serve to enhance their stability, thus 
increasing their half life within the cell. The probe molecules used in the biopanning 
process consist of oligonucleotides labeled with reporter molecules that only fluoresce 

20 upon binding of the probe to a target molecule. Various dyes or stains well known in 
the art, for example those described in "Practical Flow Cytometry", 1995 Wiley-Liss, 
Inc., Howard M. Shapiro, M.D., can be used to intercalate or associate with nucleic 
acid in order to "label" the oligonucleotides. These probes are introduced into the 
recombinant cells of the library using one of several transformation methods. The 

25 probe molecules interact or hybridize to the transcribed target mRNA or DNA 
resulting in DNA/RNA heteroduplex molecules or DNA/DNA duplex molecules. 
Binding of the probe to a target will yield a fluorescent signal which is detected and 
sorted by the FACS machine during the screening process. 

The probe DNA can be at least about 10 bases, or, at least 15 bases. 

30 Other size ranges for probe DNA are at least about 1 5 bases to about 100 bases, at least 
about 100 bases to about 500 bases, at least about 500 bases to about 1,000 bases, at least 
about 1 ,000 bases to about 5,000 bases and at least about 5,000 bases to about 10,000 



54 



09010-400001 (DIVER 1280-36) 

bases. In one aspect, an entire coding region of one part of a pathway may be employed 
as a probe. Where the probe is hybridized to the target DNA in an in vitro system, 
conditions for the hybridization in which target DNA is selectively isolated by the use of 
at least one DNA probe will be designed to provide a hybridization stringency of at least 
5 about 50% sequence identity, more particularly a stringency providing for a sequence 
identity of at least about 70%. Hybridization techniques for probing a microbial DNA 
library to isolate target DNA of potential interest are well known in the art and any of 
those which are described in the literature are suitable for use herein. Prior to 
fluorescence sorting the clones may be viable or non-viable. For example, in one aspect, 

1 0 the cells are fixed with paraformaldehyde prior to sorting. 

Once viable or non-viable clones containing a sequence substantially 
complementary to the probe DNA are separated by a fluorescence analyzer, 
polynucleotides present in the separated clones may be further manipulated. In some 
instances, it may be desirable to perform an amplification of fee target DNA that has 

1 5 been isolated. In this aspect, the target DNA is separated from the probe DNA after 
isolation. In one aspect, the clone can be grown to expand the clonal population. 
Alternatively, the host cell is lysed and the target DNA amplified. It is then amplified 
before being used to transform a new host (e.g., subcloning). Long PCR (Barnes, W M, 
Proc. Natl. Acad. Sci, USA, Mar. 15, 1994) can be used to amplify large DNA fragments 

20 (e.g., 35 kb). Numerous amplification methodologies are now well known in the art. 

Where the target DNA is identified in vitro, the selected DNA is then 
used for preparing a library for further processing and screening by transforming a 
suitable organism. Hosts can be transformed by artificial introduction of a vector 
containing a target DNA by inoculation under conditions conducive for such 

25 transformation. 

The resultant libraries (enriched for a polynucleotide of interest) can then 
be screened for clones which display an activity of interest. Clones can be shuttled in 
alternative hosts for expression of active compounds, or screened using methods 
described herein. 

30 Having prepared a multiplicity of clones from DNA selectively isolated 

via hybridization technologies described herein, such clones are screened for a specific 
activity to identify clones having a specified characteristic. 
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The screening for activity may be effected on individual expression 
clones or may be initially effected on a mixture of expression clones to ascertain whether 
or not the mixture has one or more specified activities. If the mixture has a specified 
activity, then the individual clones may be rescreened for such activity or for a more 

5 specific activity. 

Prior to, subsequent to or as an alternative to the in vivo biopanning 
described above is an encapsulation technique such as GMDs, which may be employed 
to localize at least one clone in one location for growth or screening by a fluorescent 
analyzer (e.g. FACS). The separated at least one clone contained in the GMD may then 

10 be cultured to expand the number of clones or screened on a FACS machine to identify 
clones containing a sequence of interest as described above, which can then be broken 
out into individual clones to be screened again on a FACS machine to identify positive 
individual clones. Screening in this manner using a FACS machine is described in 
patent application Ser. No. 08/876,276, filed June 16, 1997. Thus, for example, if a 

1 5 clone has a desirable activity, then the individual clones may be recovered and 
rescreened utilizing a FACS machine to determine which of such clones has the 
specified desirable activity. 

Further, it is possible to combine some or all of the above aspects such 
that a normalization step is performed prior to generation of the expression library, the 

20 expression library is then generated, the expression library so generated is then 
biopanned, and the biopanned expression library is then screened using a high 
throughput cell sorting and screening instrument. Thus there are a variety of options, 
including: (i) generating the library and then screening it; (ii) normalize the target 
DNA, generate the expression library and screen it; (iii) normalize, generate the 

25 library, biopan and screen; or (iv) generate, biopan and screen the library. 

The library may, for example, be screened for a specified enzyme 
activity. For example, the enzyme activity screened for may be one or more of the six 
IUB classes; oxidoreductases, transferases, hydrolases, lyases, isomerases and ligases. 
The recombinant enzymes which are determined to be positive for one or more of the 

30 IUB classes may then be rescreened for a more specific enzyme activity. 

Alternatively, the library may be screened for a more specialized 
enzyme activity. For example, instead of generically screening for hydrolase activity, 
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the library may be screened for a more specialized activity, i.e. the type of bond on 
which the hydrolase acts. Thus, for example, the library may be screened to ascertain 
those hydrolases which act on one or more specified chemical functionalities, such as: 
(a) amide (peptide bonds), i.e. proteases; (b) ester bonds, i.e. esterases and lipases; (c) 

5 acetals, i.e., glycosidases etc. 

As described with respect to one of the above aspects, the invention 
provides a process for activity screening of clones containing selected DNA derived 
from a mixed population of organisms or more than one organism. 

Biopanning polynucleotides from a mixed population of organisms by 

1 0 separating the clones or polynucleotides positive for sequence of interest with a 
fluorescent analyzer that detects fluorescence, to select polynucleotides or clones 
containing polynucleotides positive for a sequence of interest, and screening the selected 
clones or polynucleotides for specified bioactivity. In one aspect, the polynucleotides 
are contained in clones having been prepared by recovering DNA of a microorganism, 

1 5 which DNA is selected by hybridization to at least one DNA sequence which is all or a 
portion of a DNA sequence encoding a bioactivity having a desirable activity. 

In another aspect, a DNA library derived from a microorganism is 
subjected to a selection procedure to select therefrom DNA which hybridizes to one or 
more probe DNA sequences which is all or a portion of a DNA sequence encoding an 

20 activity having a desirable activity by contacting a DNA library with a fluorescent 
labeled DNA probe under conditions permissive of hybridization so as to produce a 
double-stranded complex of probe and members of the DNA library. 

The present invention offers the ability to screen for many types of 
bioactivities. For instance, the ability to select and combine desired components from a 

25 library of polyketides and postpolyketide biosynthesis genes for generation of novel 
polyketides for study is appealing. The method(s) of the present invention make it 
possible to and facilitate the cloning of novel polyketide synthase genes and/or gene 
pathways, and other relevant pathways or genes encoding commercially relevant 
secondary metabolites, since one can generate gene banks with clones containing large 

30 inserts (especially when using vectors which can accept large inserts, such as the f-factor 
based vectors), which facilitates cloning of gene clusters. 
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The biopanning approach described above can be used to create libraries 
enriched with clones carrying sequences substantially homologous to a given probe 
sequence. Using this approach libraries containing clones with inserts of up to 40 kbp or 
larger can be enriched approximately 1,000 fold after each round of panning. This 
5 enables one to reduce the number of clones to be screened after 1 round of biopanning 
enrichment. This approach can be applied to create libraries enriched for clones carrying 
sequence of interest related to a bioactivity of interest, for example, polyketide 
sequences. 

Hybridization screening using high density filters or biopanning has 

1 0 proven an efficient approach to detect homologues of pathways containing genes of 
interest to discover novel bioactive molecules that may have no known counterparts. 
Once a polynucleotide of interest is enriched in a library of clones it may be desirable to 
screen for an activity. For example, it may be desirable to screen for the expression of 
small molecule ring structures or "backbones". Because the genes encoding these 

1 5 polycyclic structures can often be expressed in E. coli, the small molecule backbone can 
be manufactured, even if in an inactive form. Bioactivity is conferred upon transferring 
the molecule or pathway to an appropriate host that expresses the requisite glycosylation 
and methylation genes that can modify or "decorate" the structure to its active form. 
Thus, even if inactive ring compounds, recombinant^ expressed in E. coli are detected 

20 to identify clones which are then shuttled to a metabolically rich host, such as 

Streptomyces (e.g., Streptomyces diversae or venezuelae) for subsequent production of 
the bioactive molecule. It should be understood that E. coli can produce active small 
molecules and in certain instances it may be desirable to shuttle clones to a metabolically 
rich host for "decoration" of the structure, but not required. The use of high throughput 

25 robotic systems allows the screening of hundreds of thousands of clones in multiplexed 
arrays in microtiter dishes. 

One approach to detect and enrich for clones carrying these structures is 
to use FACS screening, a procedure described and exemplified in U.S. Ser. No. 
08/876,276, filed June 16, 1997. Polycyclic ring compounds typically have characteristic 

30 fluorescent spectra when excited by ultraviolet light. Thus, clones expressing these 

structures can be distinguished from background using a sufficiently sensitive detection 
method. High throughput FACS screening can be utilized to screen for small molecule 
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backbones in, for example, E. coli libraries. Commercially available FACS machines 
are capable of screening up to 100,000 clones per second for UV active molecules. 
These clones can be sorted for further FACS screening or the resident plasmids can be 
extracted and shuttled to Streptomyces for activity screening. 
5 In another aspect, a bioactivity or biomolecule or compound is detected 

by using various electromagnetic detection devices, including, for example, optical, 
magnetic and thermal detection associated with a flow cytometer. Flow cytometer 
typically use an optical method of detection (fluorescence, scatter, and the like) to 
discriminate individual cells or particles from within a large population. There are 
1 0 several non-optical technologies that could be used alone or in conjunction with the 
optical methods to enable new discrimination/screening paradigms. 

Magnetic field sensing is one such techniques that can be used as an 
alternative or in conjunction with, for example, fluorescence based methods. Hall-Effect 
Sensors are one example of sensors that can be employed. Superconducting Quantum 
1 5 Interference Devices ("SQUIDS") are the most sensitive sensors for magnetic flux and 
magnetic fields, so far developed. A standardized criteria for the sensitivity of a SQUID 
is its energy resolution. This is defined as the smallest change in energy that the SQUID 
can detect in one second (or in a bandwidth of 1 Hz). Typical values are 10" 33 J/Hz. The 
utility of SQUIDS can be found in the presence of magnetosomes in certain types of 
20 bacterial that contain chains of permanent single magnetic domain particles of magnetite 
(FE3O4) of gregite (Fe 3 S 4 ). The magnetic field (or residual magnetic field) of a cell that 
contains a magnetosome is detected by positioning a SQUID in close proximity to the 
flow stream of a flow cytometer. Using this method cells or cells containing, for 
example, magnetic probes can be isolated based on their magnetic properties. As 
25 another example, changes in the synthetic pathway of magnetosome containing bacteria 
can be measured using a similar technique. Such techniques can be used to identify 
agents which modulate the synthetic pathway of magnetosomes. 

Measuring dynamic charge properties is another techniques that can be 
used as an alternative or in conjunction with, for example, fluorescence based methods. 
30 Multipole Coupling Spectroscopy ("MCS") directly measures the dynamic charge 

properties of systems without the need for labeling. Structural changes that occur when 
molecules interact result in representative changes in charge distribution, and these 
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produce a dielectric based spectra or "signature" that reveals the affinity, specificity and 
functionality of each interaction. Similar changes in charge distribution occur in cellular 
systems. By observing the changes in these signatures, the dynamics of molecular 
pathways and cellular function can be resolved in their native conditions. MCS utilizes a 
5 small microwave (500 MHz to 50 GHz) transceiver that could be positioned in close 
proximity to the flow stream of a flow cytometer. Because of the short measurement 
times (e.g., microseconds) required, a complete MCS signature for each cell within the 
stream of a flow cytometer can be generated and analyzed. Certain cells can then be 
sorted and/or isolated based on either spectral features that are known a priori or based 
1 0 on some statistical variation from a general population. Examples of uses for this 

technique include selection of expression mutants, small molecule pre-screening, and the 
like. 

In one screening approach, biomolecules from candidate clones can be 
tested for bioactivity by susceptibility screening against test organisms such as 
1 5 Staphylococcus aureus, Micrococcus luteus, E. coli, or Saccharomyces cerevisiae. FACS 
screening can be used in this approach by co-encapsulating clones with the test 
organism. 

An alternative to the above-mentioned screening methods provided by 
the present invention is an approach termed "mixed extract" screening. The "mixed 

20 extract" screening approach takes advantage of the fact that the accessory genes needed 
to confer activity upon the polycyclic backbones are expressed in metabolically rich 
hosts, such as Streptomyces, and that the enzymes can be extracted and combined with 
the backbones extracted from E. coli clones to produce the bioactive compound in vitro. 
Enzyme extract preparations from metabolically rich hosts, such as Streptomyces strains, 

25 at various growth stages are combined with pools of organic extracts from E. coli 

libraries and then evaluated for bioactivity. Another approach to detect activity in the E. 
coli clones is to screen for genes that can convert bioactive compounds to different 
forms. For example, a recombinant enzyme was recently discovered that can convert the 
low value daunomycin to the higher value doxorubicin. Similar enzyme pathways are 

30 being sought to convert penicillins to cephalosporins. 

Screening may be carried out to detect a specified enzyme activity by 
procedures known in the art. For example, enzyme activity may be screened for one or 
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more of the six IUB classes; oxidoreductases, transferases, hydrolases, lyases, 
isomerases and ligases. The recombinant enzymes which are determined to be positive 
for one or more of the IUB classes may then be rescreened for a more specific enzyme 
activity. Alternatively, the library may be screened for a more specialized enzyme 
5 activity. For example, instead of generically screening for hydrolase activity, the library 
may be screened for a more specialized activity, i.e. the type of bond on which the 
hydrolase acts. Thus, for example, the library may be screened to ascertain those 
hydrolases which act on one or more specified chemical functionalities, such as: (a) 
amide (peptide bonds), i.e. proteases; (b) ester bonds, i.e. esterases and lipases; (c) 
10 acetals, i.e., glycosidases. 

FACS screening can also be used to detect expression of UV fluorescent 
molecules in any host, including metabolically rich hosts, such as Streptomyces. For 
example, recombinant oxytetracylin retains its diagnostic red fluorescence when 
produced heterologously in S. lividans TK24. Pathway clones, which can be sorted by 
1 5 FACS, can thus be screened for polycyclic molecules in a high throughput fashion. 

Recombinant bioactive compounds can also be screened in vivo using 
"two-hybrid" systems, which can detect enhancers and inhibitors of protein-protein or 
other interactions such as those between transcription factors and their activators, or 
receptors and their cognate targets. In this aspect, both the small molecule pathway and 
20 the reporter construct are co-expressed. Clones altered in reporter expression can then be 
sorted by FACS and the pathway clone isolated for characterization. 

As indicated, common approaches to drug discovery involve screening 
assays in which disease targets (macromolecules implicated in causing a disease) are 
exposed to potential drug candidates which are tested for therapeutic activity. In other 
25 approaches, whole cells or organisms that are representative of the causative agent of the 
disease, such as bacteria or tumor cell lines, are exposed to the potential candidates for 
screening purposes. Any of these approaches can be employed with the present 
invention. 

The present invention also allows for the transfer of cloned pathways 
30 derived from uncultivated samples into metabolically rich hosts for heterologous 
expression and downstream screening for bioactive compounds of interest using a 
variety of screening approaches briefly described above. 
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Recovering Desirable Bioactivities 

In one aspect, after viable or non-viable cells, each containing a different 
expression clone from the gene library are screened, and positive clones are recovered, 
DNA can be isolated from positive clones utilizing techniques well known in the art. The 

5 DNA can then be amplified either in vivo or in vitro by utilizing any of the various 
amplification techniques known in the art. In vivo amplification would include 
transformation of the clone(s) or subclone(s) into a viable host, followed by growth of 
the host. In vitro amplification can be performed using techniques such as the 
polymerase chain reaction. Once amplified the identified sequences can be "evolved" 

10 or sequenced. 
Evolution 

In one aspect, the present invention manipulates the identified 
polynucleotides to generate and select for encoded variants with altered activity or 
specificity. Clones found to have the bioactivity for which the screen was performed can 

1 5 be subjected to directed mutagenesis to develop new bioactivities with desired properties 
or to develop modified bioactivities with particularly desired properties that are absent or 
less pronounced in the wild-type activity, such as stability to heat or organic solvents. 
Any of the known techniques for directed mutagenesis are applicable to the invention. 
For example, mutagenesis techniques for use in accordance with the invention include 

20 those described below. 

Alternatively, it may be desirable to variegate a polynucleotide sequence 
obtained, identified or cloned as described herein. Such variegation can modify the 
polynucleotide sequence in order to modify (e.g., increase or decrease) the encoded 
polypeptide's activity, specificity, affinity, function, etc. Such evolution methods are 

25 known in the art or described herein, such as, shuffling, cassette mutagenesis, 
recursive ensemble mutagenesis, sexual PCR, directed evolution, exonuclease- 
mediated reassembly, codon site-saturation mutagenesis, amino acid site-saturation 
mutagenesis, gene site saturation mutagenesis, introduction of mutations by non- 
stochastic polynucleotide reassembly methods, synthetic ligation polynucleotide 

30 reassembly, gene reassembly, oligonucleotide-directed saturation mutagenesis, in vivo 
reassortment of polynucleotide sequences having partial homology, naturally 
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occurring recombination processes which reduce sequence complexity, and any 
combination thereof. 

The clones enriched for a desired polynucleotide sequence, which are 
identified as described above, may be sequenced to identify the DNA sequence(s) 

5 present in the clone, which sequence information can be used to screen a database for 
similar sequences or functional characteristics. Thus, in accordance with the present 
invention it is possible to isolate and identify: (i) DNA having a sequence of interest 
(e.g., a sequence encoding an enzyme having a specified enzyme activity), (ii) 
associate the sequence with known or unknown sequence in a database (e.g., database 

10 sequence associated with an enzyme having an activity (including the amino acid 
sequence thereof)), and (iii) produce recombinant enzymes having such activity. 

Sequencing may be performed by high through-put sequencing 
techniques. The exact method of sequencing is not a limiting factor of the invention. 
Any method useful in identifying the sequence of a particular cloned DNA sequence can 

1 5 be used. In general, sequencing is an adaptation of the natural process of DNA 

replication. Therefore, a template (e.g., the vector) and primer sequences are used. One 
general template preparation and sequencing protocol begins with automated picking of 
bacterial colonies, each of which contains a separate DNA clone which will function as a 
template for the sequencing reaction. The selected clones are placed into media, and 

20 grown overnight. The DNA templates are then purified from the cells and suspended in 
water. After DNA quantification, high-throughput sequencing is performed using a 
sequencers, such as Applied Biosystems, Inc., Prism 377 DNA Sequencers. The 
resulting sequence data can then be used in additional methods, including to search a 
database or databases. 

25 Database Searches and Alignment Algorithms 

A number of source databases are available that contain either a nucleic 
acid sequence and/or a deduced amino acid sequence for use with the invention in 
identifying or determining the activity encoded by a particular polynucleotide sequence. 
All or a representative portion of the sequences (e.g., about 100 individual clones) to be 

30 tested are used to search a sequence database (e.g., GenBank, PFAM or ProDom), either 
simultaneously or individually. A number of different methods of performing such 
sequence searches are known in the art. The databases can be specific for a particular 
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organism or a collection of organisms. For example, there are databases for the C. 
elegans, Arabadopsis. sp., M. genitalium, M. jannaschii, E. coli, H. influenzae, S. 
cerevisiae and others. The sequence data of the clone is then aligned to the sequences in 
the database or databases using algorithms designed to measure homology between two 

5 or more sequences. 

Such sequence alignment methods include, for example, BLAST 
(Altschul et al., 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993), and FASTA 
(Person & Lipman, 1988). The probe sequence (e.g., the sequence data from the clone) 
can be any length, and will be recognized as homologous based upon a threshold 

1 0 homology value. The threshold value may be predetermined, although this is not 

required. The threshold value can be based upon the particular polynucleotide length. 
To align sequences a number of different procedures can be used. Typically, Smith- 
Waterman or Needleman-Wunsch algorithms are used. However, as discussed faster 
procedures such as BLAST, FASTA, PSI-BLAST can be used. 

1 5 For example, optimal alignment of sequences for aligning a comparison 

window may be conducted by the local homology algorithm of Smith (Smith and 
Waterman, Adv Appl Math, 1981; Smith and Waterman, J Teor Biol, 1981; Smith and 
Waterman, J Mol Biol, 1981; Smith et al, J Mol Evol, 1981), by the homology alignment 
algorithm of Needleman (Needleman and Wuncsch, 1970), by the search of similarity 

20 method of Pearson (Pearson and Lipman, 1 988), by computerized implementations of 
these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics 
Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, 
WI, or the Sequence Analysis Software Package of the Genetics Computer Group, 
University of Wisconsin, Madison, WI), or by inspection, and the best alignment (i.e., 

25 resulting in the highest percentage of homology over the comparison window) generated 
by the various methods is selected. The similarity of the two sequence (i.e., the probe 
sequence and the database sequence) can then be predicted. 

Such software matches similar sequences by assigning degrees of 
homology to various deletions, substitutions and other modifications. The terms 

30 "homology" and "identity" in the context of two or more nucleic acids or polypeptide 
sequences, refer to two or more sequences or subsequences that are the same or have a 
specified percentage of amino acid residues or nucleotides that are the same when 
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compared and aligned for maximum correspondence over a comparison window or 
designated region as measured using any number of sequence comparison algorithms or 
by manual alignment and visual inspection. 

For sequence comparison, typically one sequence acts as a reference 

5 sequence, to which test sequences are compared. When using a sequence comparison 
algorithm, test and reference sequences are entered into a computer, subsequence 
coordinates are designated, if necessary, and sequence algorithm program parameters are 
designated. Default program parameters can be used, or alternative parameters can be 
designated. The sequence comparison algorithm then calculates the percent sequence 

1 0 identities for the test sequences relative to the reference sequence, based on the program 
parameters. 

A "comparison window", as used herein, includes reference to a segment 
of any one of the number of contiguous positions selected from the group consisting of 
from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in 

1 5 which a sequence may be compared to a reference sequence of the same number of 
contiguous positions after the two sequences are optimally aligned. 

One example of an algorithm used in the methods of the invention is 
BLAST and BLAST 2.0 algorithms, which are described in Altschul et al, Nuc. Acids 
Res. 25:3389-3402 (1977) and Altschul et al, J. Mol. Biol. 215:403-410 (1990), 

20 respectively. Software for performing BLAST analyses is publicly available through the 
National Center for Biotechnology Information. This algorithm involves first 
identifying high scoring sequence pairs (HSPs) by identifying short words of length W 
in the query sequence, which either match or satisfy some positive-valued threshold 
score T when aligned with a word of the same length in a database sequence. T is 

25 referred to as the neighborhood word score threshold (Altschul et al, supra). These 
initial neighborhood word hits act as seeds for initiating searches to find longer HSPs 
containing them. The word hits are extended in both directions along each sequence for 
as far as the cumulative alignment score can be increased. Cumulative scores are 
calculated using, for nucleotide sequences, the parameters M (reward score for a pair of 

30 matching residues; always >0). The BLAST algorithm parameters W, T, and X 
determine the sensitivity and speed of the alignment. The BLASTN program (for 
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nucleotide sequences) uses as defaults a wordlength (W) of 1 1 , an expectation (E) of 1 0, 
M=5, N=-4 and a comparison of both strands. 

The BLAST algorithm also performs a statistical analysis of the 
similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Natl. Acad. Sci. 
5 USA 90:5873 (1993)). One measure of similarity provided by BLAST algorithm is the 
smallest sum probability (P(N)), which provides an indication of the probability by 
which a match between two nucleotide sequences would occur by chance. For example, 
a nucleic acid is considered similar to a references sequence if the smallest sum 
probability in a comparison of the test nucleic acid to the reference nucleic acid is less 
1 0 than about 0.2, more preferably less than about 0.01, and most preferably less than about 
0.001. 

Sequence homology means that two polynucleotide sequences are 
homologous (i.e., on a nucleotide-by-nucleotide basis) over the window of comparison. 
A percentage of sequence identity or homology is calculated by comparing two 

1 5 optimally aligned sequences over the window of comparison, determining the number of 
positions at which the identical nucleic acid base (e.g., A, T, C, G, U, or I) occurs in both 
sequences to yield the number of matched positions, dividing the number of matched 
positions by the total number of positions in the window of comparison (i.e., the window 
size), and multiplying the result by 100 to yield the percentage of sequence homology. 

20 This substantial homology denotes a characteristic of a polynucleotide sequence, 

wherein the polynucleotide comprises a sequence having at least 60 percent sequence 
homology, typically at least 70 percent homology, often 80 to 90 percent sequence 
homology, and most commonly at least 99 percent sequence homology as compared to a 
reference sequence of a comparison window of at least 25-50 nucleotides, wherein the 

25 percentage of sequence homology is calculated by comparing the reference sequence to 
the polynucleotide sequence which may include deletions or additions which total 20 
percent or less of the reference sequence over the window of comparison. 

Sequences having sufficient homology can then be further identified by 
any annotations contained in the database, including, for example, species and activity 

30 information. Accordingly, in a typical mixed population sample, a plurality of nucleic 
acid sequences will be obtained, cloned, sequenced and corresponding homologous 
sequences from a database identified. This information provides a profile of the 
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polynucleotides present in the sample, including one or more features associated with the 
polynucleotide including the organism and activity associated with that sequence or any 
polypeptide encoded by that sequence based on the database information. As used herein 
"fingerprint" or "profile" refers to the fact that each sample will have associated with it a 
5 set of polynucleotides characteristic of the sample and the environment from which it 
was derived. Such a profile can include the amount and type of sequences present in the 
sample, as well as information regarding the potential activities encoded by the 
polynucleotides and the organisms from which polynucleotides were derived. This 
unique pattern is each sample's profile or fingerprint. 

1 0 In some instances it may be desirable to express a particular cloned 

polynucleotide sequence once its identity or activity is determined or a demonstrated 
identity or activity is associated with the polynucleotide. In such instances the desired 
clone, if not already cloned into an expression vector, is ligated downstream of a 
regulatory control element (e.g., a promoter or enhancer) and cloned into a suitable host 

1 5 cell. Expression vectors are commercially available along with corresponding host cells 
for use in the invention. 

As representative examples of expression vectors which may be used 
there may be mentioned viral particles, baculovirus, phage, plasmids, phagemids, 
cosmids, fosmids, bacterial artificial chromosomes, viral nucleic acid (e.g., vaccinia, 

20 adenovirus, foul pox virus, pseudorabies and derivatives of SV40), PI -based artificial 
chromosomes, yeast plasmids, yeast artificial chromosomes, and any other vectors 
specific for specific hosts of interest (such as bacillus, Aspergillus, yeast, etc.) Thus, for 
example, the DNA may be included in any one of a variety of expression vectors for 
expressing a polypeptide. Such vectors include chromosomal, nonchromosomal and 

25 synthetic DNA sequences. Large numbers of suitable vectors are known to those of skill 
in the art, and are commercially available. The following vectors are provided by way of 
example; ZAP Express, Lambda ZAP®- CMV, Lambda ZAP® II , Lambda gtlO, 
Lambda gtl 1, pMyr, pSos, pCMV-Script, pCMV-Script XR, pBK Phagemid, pBK- 
CMV, pBK-RSV, pBluescript II Phagemid, pBluescript II KS +, pBluescript II SK +, 

30 pBluescript II SK -, Lambda FIX n, Lambda DASH II, Lambda EMBL3 and 
EMBL4, EMBL3, EMBL4, SuperCos I and pWE15, pWE15, SuperCos I, pPCR- 
Script Amp, pPCR-Script Cam, pCMV-Script, pBC KS +, pBC KS -, pBC SK +, 
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pBC SK - psiX174, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene); PT7BLUE, 
pSTBlue, pCITE, pET, ptriEx, pForce (Novagen); pIND-E, pIND Vector, 
pIND/Hygro, pIND(SPl)/Hygro, pIND/GFP, pIND(SPl)/GFP, pIND/V5-His and 
pIND(SPl)/V5-His Tag, pIND TOPO TA, pShooter™ Targeting Vectors, pTracer™ 

5 GFP Reporter Vectors, pcDNA© Vector Collection, EBV Vectors, Voyager™ VP22 
Vectors, pVAXl - DNA vaccine vector, pcDNA4/His-Max, pBCl Mouse Milk 
System (Invitrogen); pQE70, pQE60, pQE-9, pQE-16, pQE - 30 / pQE -80, pQE 31/ 
pQE 81, pQE -32/pQE 82, pQE - 40, pQE - 100 Double Tag (Qiagen); pTRC99a, 
pKK223-3, pKK233-3, pDR540, pRIT5, pWLNEO, pSV2CAT, pOG44, pXTl, pSG 

10 (Stratagene), pSVK3, pBPV, pMSG, pSVL (Pharmacia).However, any other plasmid or 
vector may be used as long as they are replicable and viable in the host. 

The nucleic acid sequence in the expression vector is operatively linked 
to an appropriate expression control sequence(s) (promoter) to direct mRNA synthesis. 
Particular named bacterial promoters include lad, lacZ, T3, T7, gpt, lambda PR, PL, 

1 5 SP6, trp, lacXJVS, PBAD, araBAD, araB, trc, proXJ, p-D-HSP, HSP, GAL4 UAS/Elb, 
TK, GAL1, CMV/Tet0 2 Hybrid, EF-la CMV, EF-la CMV, EF-la CMV, EF, EF-la, 
ubiquitin C, rsv-ltr, rsv , b -lactamase, nmtl, and gal 10. Eukaryotic promoters include 
CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from 
retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and 

20 promoter is well within the level of ordinary skill in the art. The expression vector also 
contains a ribosome binding site for translation initiation and a transcription terminator. 
The vector may also include appropriate sequences for amplifying expression. Promoter 
regions can be selected from any desired gene using CAT (chloramphenicol transferase) 
vectors or other vectors with selectable markers. 

25 In addition, the expression vectors can contain one or more selectable 

marker genes to provide a phenotypic trait for selection of transformed host cells such as 
dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as 
tetracycline or ampicillin resistance in E. coli. 

The nucleic acid sequence(s) selected, cloned and sequenced as 

30 hereinabove described can additionally be introduced into a suitable host to prepare a 
library which is screened for the desired enzyme activity. The selected nucleic acid is 
preferably already in a vector which includes appropriate control sequences whereby a 
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selected nucleic acid encoding an enzyme may be expressed, for detection of the desired 
activity. The host cell can be a higher eukaryotic cell, such as a mammalian cell, or a 
lower eukaryotic cell, such as a yeast cell, or the host cell can be a prokaryotic cell, such 
as a bacterial cell. The selection of an appropriate host is deemed to be within the scope 

5 of those skilled in the art from the teachings herein. 

In some instances it may be desirable to perform an amplification of the 
nucleic acid sequence present in a sample or a particular clone that has been isolated. In 
this aspect the nucleic acid sequence is amplified by PCR reaction or similar reaction 
known to those of skill in the art. Commercially available amplification kits are 

1 0 available to carry out such amplification reactions. 

In addition, it is important to recognize that the alignment algorithms and 
searchable database can be implemented in computer hardware, software or a 
combination thereof. Accordingly, the isolation, processing and identification of nucleic 
acid sequences and the corresponding polypeptides encoded by those sequence can be 

1 5 implemented in and automated system. 
Capillary -Based Screening 

Figure 6A shows a capillary array (10) which includes a plurality of 
individual capillaries (20) having at least one outer wall (30) defining a lumen (40). The 
outer wall (30) of the capillary (20) can be one or more walls fused together. Similarly, 

20 the wall can define a lumen (40) that is cylindrical, square, hexagonal or any other 
geometric shape so long as the walls form a lumen for retention of a liquid or sample. 
The capillaries (20) of the capillary array (10) are held together in close proximity to 
form a planar structure. The capillaries (20) can be bound together, by being fused (e.g., 
where the capillaries are made of glass), glued, bonded, or clamped side-by-side. The 

25 capillary array (10) can be formed of any number of individual capillaries (20). In an 
aspect, the capillary array includes 100 to 4,000,000 capillaries (20). In one aspect, the 
capillary array includes 100 to 500,000,000 capillaries (20). In one aspect, the capillary 
array includes 100,000 capillaries (20). In one specific aspect, the capillary array (10) 
can be formed to conform to a microtiter plate footprint, i.e. 127.76mm by 85.47mm, 

30 with tolerances. The capillary array (10) can have a density of 500 to more than 1 ,000 
capillaries (20) per cm2, or about 5 capillaries per mm2. For example, a microtiter plate 
size array of 3um capillaries would have about 500 million capillaries. 
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The capillaries (20) can be formed with an aspect ratio of 50: 1 . In one 
aspect, each capillary (20) has a length of approximately 10mm, and an internal diameter 
of the lumen (40) of approximately 200nm. However, other aspect ratios are possible, 
and range from 10:1 to well over 1000:1. Accordingly, the thickness of the capillary 

5 array can vary from 0.5mm to over 10cm. Individual capillaries (20) have an inner 
diameter that ranges from 3- 500|im and 0-500nm. A capillary (20) having an internal 
diameter of 200 \im and a length of 1 cm has a volume of approximately 0.3 |xl. The 
length and width of each capillary (20) is based on a desired volume and other 
characteristics discussed in more detail below, such as evaporation rate of liquid from 

1 0 within the capillary, and the like. Capillaries of the invention may include a volume as 
low as 250 nanoliters/well. 

In accordance with one aspect of the invention, one or more particles are 
introduced into each capillary (20) for screening. Suitable particles include cells, cell 
clones, and other biological matter, chemical beads, or any other particulate matter. The 

1 5 capillaries (20) containing particles of interest can be introduced with various types of 
substances for causing an activity of interest. The introduced substance can include a 
liquid having a developer or nutrients, for example, which assists in cell growth and 
which results in the production of enzymes. Or, a chemical solution containing new 
particles can cause a combining event with other chemical beads already introduced into 

20 one or more capillaries (20). The particles and resulting activity of interest are screened 
and analyzed using the capillary array (10) according to the present invention. In one 
aspect, the activity produces a change in properties of matter within the capillary (20), 
such as optical properties of the particles. Each capillary can act as a waveguide for 
guiding detectable light energy or property changes to an analyzer. The capillaries (20) 

25 can be made according to various manufacturing techniques. In one particular aspect, 
the capillaries (20) are manufactured using a hollow-drawn technique. A cylindrical, or 
other hollow shape, piece of glass is drawn out to continually longer lengths according to 
known techniques. The piece of glass is preferably formed of multiple layers. The 
drawn glass is then cut into portions of a specific length to form a relatively large 

30 capillary. The capillary portions are next bundled into an array of relatively large 
capillaries, and then drawn again to increasingly narrower diameters. During the 
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drawing process, or when the capillaries are formed to a desired width, application of 
heat can fuse interstitial areas of adjacent capillaries together. 

In an alternative aspect, a glass etching process is used. A solid tube of 
glass can be drawn out to a particular width, cut into portions of a specific length, and 

5 drawn again. Then, each solid tube portion is center-etched with an acid or other etchant 
to form a hollow capillary. The tubes can be bound or fused together before or after the 
etch process. A number of capillary arrays (10) can be connected together to form an 
array of arrays (12), as shown in Figure 6B. The capillary arrays (10) can be glued 
together. Alternatively, the capillary arrays (10) can be fused together. According to 

10 this technique, the array of arrays (12) can have any desired size or footprint, formed of 
any number of high-precision capillary arrays (10). 

A large number of materials can be suitably used to form a capillary 
array according to the invention and depending on the manufacturing technique used, 
including without limitation, glass, metal, semiconductors such as silicon, quartz, 

1 5 ceramics, or various polymers and plastics including, among others, polyethylene, 
polystyrene, and polypropylene. The internal walls of the capillary array, or portions 
thereof, may be coated or silanized to modify their surface properties. For example, the 
hydrophilicity or hydrophobicity may be altered to promote or reduce wicking or 
capillary action, respectively. The coating material includes, for example, ligands such 

20 as avidin, streptavidin, antibodies, antigens, and other molecules having specific binding 
affinity or which can withstand thermal or chemical sterilization. 

While the above-described manufacturing techniques and materials yield 
high precision micro-sized capillaries and capillary arrays, the size, spacing and 
alignment of the capillaries within an array may be non-uniform. In some instances, it is 

25 desirable to have two capillary arrays make contact in as close alignment as possible, 
such as, for example, to transfer liquid from capillaries in a first capillary array to 
capillaries in a second capillary array. One capillary array according to the invention 
may be cut horizontally along its thickness, and separated to form two capillary arrays. 
The two resulting capillary arrays will each include at least one surface having capillary 

30 openings of substantially identical size, spacing and alignment, and suitable for 

contacting together for transferring liquid from one resulting capillary array to the other. 
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Figure 7 shows a horizontal cross section of a portion of an array of 
capillaries (20). Capillary (20) is shown having a first cylindrical wall (30), a lumen 
(40), a second exterior wall (50), and interstitial material (60) separating the capillary 
tubes in the array (10). In this aspect, the cylindrical wall (30) is comprised of a sleeve 

5 glass, while exterior wall (50) is comprised of an extra mural absorption (EMA) glass to 
minimize optical cross-talk among neighboring capillaries (20). 

A capillary array may optionally include reference indicia (22) for 
providing a positional or alignment reference. The reference indicia (22) may be formed 
of a pad of glass extending from the surface of the capillary array, or embedded in the 

1 0 interstitial material (60). In one aspect, the reference indicia (22) are provided at one or 
more corners of a microtiter plate formed by the capillary array. According to the 
aspect, a comer of the plate or set of capillaries may be removed, and replaced with the 
reference indicia (22). The reference indicia (22) may also be formed at spaced intervals 
along a capillary array, to provide an indication of a subset of capillaries (20). 

1 5 Figure 8 depicts a vertical cross-section of a capillary of the invention. 

The capillary (20) includes a first wall (30) defining a lumen (40), and a second wall (50) 
surrounding the first wall (30). In one aspect, the second wall (50) has a lower index of 
refraction than the first wall (30). In one aspect, the first wall (30) is sleeve glass having 
a high index of refraction, forming a waveguide in which light from excited fluorophores 

20 travels. In the exemplary aspect, the second wall (50) is black EMA glass, having a low 
index of refraction, forming a cladding around the first wall (30) against which light is 
refracted and directed along the first wall (30) for total internal reflection within the 
capillary (20). The second wall (50) can thus be made with any material that reduces the 
"cross-talk" or diffusion of light between adjacent capillaries. Alternatively, the inside 

25 surface of the first wall (30) can be coated with a reflective substance to form a mirror, 
or mirror-like structure, for specular reflection within the lumen (40). 

Many different materials can be used in forming the first and second 
walls, creating different indices of refraction for desired purposes. A filtering material 
can be formed around the lumen (40) to filter energy to and from the lumen (40) as 

30 depicted in Figure 9. In one aspect, the inner wall of the first wall (30) of each capillary 
of the array, or portion of the array, is coated with the filtering material. In another 
aspect, the second wall (50) includes the filtering material. For instance, the second wall 
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(50) can be formed of the filtering material, such as filter glass for example, or in one 
exemplary aspect, the second wall (50) is EMA glass that is doped with an appropriate 
amount of filtering material. The filtering material can be formed of a color other than 
black and tuned for a desired excitation/emission filtering characteristic. 

5 The filtering material allows transmission of excitation energy into the 

lumen (40), and blocks emission energy from the lumen (40) except through one or more 
openings at either end of the capillary (20). In Figure 9, excitation energy is illustrated 
as a solid line, while emission energy is indicated by a broken line. When the second 
wall (50) is formed with a filtering material as shown in Figure 9, certain wavelengths of 

1 0 light representing excitation energy are allowed through to the lumen (40), and other 
wavelengths of light representing emission energy are blocked from exiting, except as 
directed within and along the first wall (30). The entire capillary array, or a portion 
thereof, can be tuned to a specific individual wavelength or group of wavelengths, for 
filtering different bands of light in an excitation and detection process. 

1 5 A particle (70) is depicted within the lumen (40). During use, an 

excitation light is directed into the lumen (40) contacting the particle (70) and exciting a 
reporter fluorescent material causing emission of light. The emitted light travels the 
length of the capillary until it reaches a detector. One advantage of an aspect of the 
present invention, where the second wall (50) is black EMA glass, is that the emitted 

20 light cannot cross contaminate adjacent capillary tubes in a capillary array. In addition, 
the black EMA glass refracts and directs the emitted light towards either end of the 
capillary tube thus increasing the signal detected by an optical detector (e.g., a CCD 
camera and the like). 

In a detection process using a capillary array of the invention, an optical 

25 detection system is aligned with the array, which is then scanned for one or more bright 
spots, representing either a fluorescence or luminescence associated with a "positive." 
The term "positive" refers to the presence of an activity of interest. Again, the activity 
can be a chemical event, or a biological event. 

Figure 10 depicts a general method of sample screening using a capillary 

30 array (10) according to the invention. In this depiction, capillary array (10) is immersed 
or contacted with a container (100) containing particles of interest. The particles can be 
cells, clones, molecules or compounds suspended in a liquid. The liquid is wicked into 
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the capillary tubes by capillary action. The natural wicking that occurs as a result of 
capillary forces obviates the need for pumping equipment and liquid dispensers. A 
substrate for measuring biological activity (e.g., enzyme activity) can be contacted with 
the particles either before or after introduction of the particles into the capillaries in the 
5 capillary array. The substrate can include clones of a cell of interest, for example. The 
substrate can be introduced simultaneously into the capillaries by placing an open end of 
the capillaries in the container (100) containing a mixture of the particle-bearing liquid 
and the substrate. In some aspects, it is a goal to achieve a certain concentration of 
particles of interest. A particular concentration of particles may also be achieved by 

1 0 dilution. Figures 1 3 A-C show one such process, which is described below. 

Alternatively, the particle-bearing liquid may be wicked a portion of the 
way into the capillaries, and then the substrate is wicked into a remaining portion of the 
capillaries. The mixture in the capillaries can then be incubated for producing a desired 
activity. The incubation can be for a specific period of time and at an appropriate 

1 5 temperature necessary for cell growth, for example, or to allow the substrate to 

permeabilize the cell membrane to produce an optically detectable signal, or for a period 
of time and at a temperature for optimum enzymatic activity. The incubation can be 
performed, for example, by placing the capillary array in a humidified incubator or in an 
apparatus containing a water source to ensure reduced evaporation within the capillary 

20 tubes. Evaporative loss may be reduced by increasing the relative humidity (e.g., by 
placing the capillary array in a humidified chamber). The evaporation rate can also be 
reduced by capping the capillaries with an oil, wax, membrane or the like. Alternatively, 
a high molecular weight fluid such as various alcohols, or molecules capable of forming 
a molecular monolayer, bilayers or other thin films (e.g., fatty acids), or various oils 

25 (e.g., mineral oil) can be used to reduce evaporation. 

Figure 1 1 illustrates a method for incubating a substrate solution 
containing cells of interest. While only a single capillary (20) is shown in Figure 1 1 for 
simplicity, it should be understood that the incubation method applies to a capillary array 
having a plurality of capillaries (20). In accordance with one aspect, a first fluid is 

30 wicked into the capillary (20) according to methods described above. The capillary (20) 
containing the substrate solution and cells (32) is then introduced to a fluid bath (70) 
containing a second liquid (72). The second liquid may or may not be the same as the 
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first. For instance, the first liquid may contain particles (32) from which an activity is 
screened. The particles (32) are suspended in liquid within the lumen (40), and 
gradually migrate toward the top of the lumen (40) in the direction of the flow of liquid 
through the capillary (20) due to evaporation. The width of the lumen (40) at the open 
end of the capillary (20) is sized to provide a particular surface area of liquid at the top of 
the lumen (40), for controlling the amount and rate of evaporation of the liquid mixture. 
By controlling the environment (68) near the non-submersed end of the capillary (20), 
the first liquid from within the capillary (20) will evaporate, and will be replenished by 
the second liquid (72) from the fluid bath (70). 

The amount of evaporation is balanced against possible diffusion of the 
contents of the capillary (20) into the liquid (72), and against possible mechanical 
mixing of the capillary contents with the liquid (72) due to vibration and pressure 
changes. The greater the width of the lumen (40), the larger the amount of mechanical 
mixing. Therefore, the temperature and humidity level in the surrounding environment 
may be adjusted to produce the desired evaporative cycle, and the lumen (40) width is 
sized to minimize mechanical mixing, in addition to produce a desired evaporation rate. 
The non-submersed open end of the capillary (20) may also be capped to create a 
vacuum force for holding the capillary contents within the capillary, and minimizing 
mechanical mixing and diffusion of the contents within the liquid (72). However when 
capped, the capillary (20) will not experience evaporation. 

The liquid (72) can be supplemented with nutrients (74) to support a 
greater likelihood or rate of activity of the particles (32). For example, oxygen can be 
added to the liquid to nourish cells or to optimize the incubation environment of the 
cells. In another example, the liquid (72) can contain a substrate or a recombinant clone, 
or a developer for the particles (32). The cells can be optimally cultured by controlling 
the amount and rate of evaporation. For instance, by decreasing relative humidity of the 
environment (68), evaporation from the lumen (40) is increased, thereby increasing a 
rate of flow of liquid (72) through the capillary (20). Another advantage of this method 
is the ability to control conditions within the capillary (20) and the environment (68) that 
are not otherwise possible. 

A relatively high humidity level of the environment will slow the rate of 
evaporation and keep more liquid within the capillary (20). If a temperature differential 
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exists between a capillary array (10) and its environment, however, condensation can 
form on or near the ends of tightly-packed capillaries of the capillary array. Figure 12A 
shows a portion of a capillary array (10) of the invention, to depict a situation in which a 
condensation bead (80) forms on the outer edge surface of several capillary walls (30), 

5 creating a potential conduit or bridge for "cross-talk" of matter between adjacent 
capillary tubes (20). The outer edge surface of the capillary walls (30) is preferably a 
planar surface. In an aspect in which the wall (30) of the capillary (20) is glass, the outer 
edge surface of the capillary wall (30) can be polished glass. 

In order to minimize the effects of such condensation, a hydrophobic 

1 0 coating (3 5) is provided over the outer edge surface of the capillary walls (30), as 

depicted in Figure 12B. The coating (35) reduces the tendency for water or other liquid 
to accumulate near the outer edge surface of the capillary wall (30). Condensation will 
form either as smaller beads (82), be repelled from the surface of the capillary array, or 
form entirely over an opening to the lumen (40). In the latter case, the condensation 

1 5 bead (80) can form a cap to the capillary (20). In one aspect, the hydrophobic coating 
(35) is TEFLON. In one configuration, the coating (35) covers only the outer edge 
surfaces of the capillary walls (30). In another configuration, the coating (35) can be 
formed over both the interstitial material (60) and the outer edge surfaces of the capillary 
walls (30). Another advantage of a hydrophobic coating (35) over the outer edge surface 

20 of the capillary tubes is during the initial wicking process, some fluidic material in the 
form of droplets will tend to stick to the surface in which the fluid is introduced. 
Therefore, the coating (35) minimizes extraneous fluid from forming on the surface of a 
capillary array (10), dispensing with a need to shake or knock the extraneous fluid from 
the surface. 

25 In some instances, it is necessary to have more than one component in a 

capillary that are not premixed, and which can by later combined by dilution or mixing. 
Figures 13A-C show a dilution process that may be used to achieve a particular 
concentration of particles. In one aspect employing dilution, a bolus of a first 
component (82) is wicked into a capillary (20) by capillary action until only a portion of 

30 the capillary (20) is filled. In one particular aspect, pressure is applied at one end of the 
capillary (20) to prevent the first component from wicking into the entire capillary (20). 
The end (21) of the capillary may be completely or partially capped to provide the 
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pressure. An amount of air (84) is then introduced into the capillary adjacent the first 
component. The air (84) can be introduced by any number of processes. One such 
process includes moving the first component (82) in one direction within the capillary 
until a suitable amount of the air (84) is introduced behind the first component (82). 
5 Further movement of the first component (82) by a pulling and/or pushing pressure 
causes a piston-like action by the first component (82) on the air. The capillary (20) or 
capillary array is then contacted to a second component (86). The second component 
(86) is preferably pulled into the capillary (20) by the piston-like action created by 
movement of the first component (82), until a suitable amount of the second component 

10 (86) is provided in the capillary, separated from the first component by the air (84). One 
of the first or second components may contain one or more particles of interest, and the 
other of the components may be a developer of the particles for causing an activity of 
interest. The capillary or capillary array can then be incubated for a period of time to 
allow the first and second components to reach an optimal temperature, or for a 

1 5 sufficient time to allow cell growth for example. The air-bubble separating the two 
components can be disrupted in order to allow mix the two components together and 
initialize the desired activity. Pressure can be applied to collapse the bubble. In one 
example, the mixture of the first and second components starts an enzymatic activity to 
achieve a multi-component assay. 

20 Paramagnetic beads contained within a capillary (20) can be used to 

disrupt the air bubble and/or mix the contents of the capillary (20) or capillary array (10). 
For example, Figure 14A and 9B depict an aspect of the invention in which 
paramagnetic beads are magnetically moved from one location to another location. The 
paramagnetic beads are attracted by magnetic fields applied in proximity to the capillary 

25 or capillary array. By alternating or adjusting the location of the magnetic field with 
respect to each capillary, the paramagnetic beads will move within each capillary to mix 
the liquid therein. Mixing the liquid can improve cell growth by increasing aeration of 
the cells. The method also improves consistency and detectability of the liquid sample 
among the capillaries. 

30 In another aspect, a method of forming a multi-component assay includes 

providing one or more capsules of a second component within a first component. The 
second component capsules can have an outer layer of a substance that melts or dissolves 
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at a predetermined temperature, thereby releasing the second component into the first 
component and combining particles among the components. A thermally activated 
enzyme may be used to dissolve the outer layer substance. Alternatively, a "release on 
command" mechanism that is configured to release the second component upon a 
5 predetermined event or condition may also be used. 

In another aspect, recombinant clones containing a reporter construct or a 
substrate are wicked into the capillary tubes of the capillary array. In this aspect, it is not 
necessary to add a substrate as the reporter construct or substrate contained in the clone 
can be readily detected using techniques known in the art. For example, a clone 

1 0 containing a reporter construct such as green fluorescent protein can be detected by 
exposing the clone or substrate within the clone to a wavelength of light that induces 
fluorescence. Such reporter constructs can be implemented to respond to various culture 
conditions or upon exposure to various physical stimuli (including light and heat). In 
addition, various compounds can be screened in a sample using similar techniques. For 

1 5 example, a compound detectably labeled with a florescent molecule can be readily 
detected within a capillary tube of a capillary array. 

In yet another aspect, instead of dilution, a fluorescence-activated cell 
sorter (FACS) is used to separate and isolate clones for delivery into the capillary array. 
In accordance with this aspect, one or more clones per capillary tube can be precisely 

20 achieved. In yet another aspect, cells within a capillary are subjected to a lysis process. 
A chemical is introduced within one of the components to cause a lysis process where 
the cells burst. 

Some assays may require an exchange of media within the capillary. In a 
media exchange process, a first liquid containing the particles is wicked into a capillary. 

25 The first liquid is removed, and replaced with a second liquid while the particles remain 
suspended within the capillary. Addition of the second liquid to the capillary and contact 
with the particles can initialize an activity, such as an assay, for example. The media 
exchange process may include a mechanism by which the particles in the capillary are 
physically maintained in the capillary while the first liquid is removed. In one aspect, 

30 the inner walls of the capillary array are coated with antibodies to which cells bind. 

Then, the first liquid is removed, while the cells remain bound to the antibodies, and the 
second liquid is wicked into the capillary. The second liquid could be adapted to cause 
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the cells to unbind if desirable. In an alternative aspect, one or more walls of the 
capillary can be magnetized. The particles are also magnetized and attracted to the 
walls. In still another aspect, magnetized particles are attracted and held against one side 
of the capillary upon application of a magnetic field near that side. 

5 The capillary array is analyzed for identification of capillaries having a 

detectable signal, such as an optical signal (e.g., fluorescence), by a detector capable of 
detecting a change in light production or light transmission, for example. Detection may 
be performed using an illumination source that provides fluorescence excitation to each 
of the capillaries in the array, and a photodetector that detects resulting emission from 

10 the fluorescence excitation. Suitable illumination sources include, without limitation, a 
laser, incandescent bulb, light emitting diode (LED), arc discharge, or photomultiplier 
tube. Suitable photodetectors include, without limitation, a photodiode array, a charge- 
coupled device (CCD), or charge injection device (CID). 

In one aspect, shown with reference to Figure 15, a detection system 

1 5 includes a laser source (82) that produces a laser beam (84). The laser beam (84) is 

directed into a beam expander (85) configured to produce a wider or less divergent beam 
(86) for exciting the array of capillaries (20). Suitable laser sources include argon or ion 
lasers. For this aspect, a cooled CCD can be used. 

The light generated by, for example, enzymatic activation of a 

20 fluorescent substrate is detected by an appropriate light detector or detectors positioned 
adjacent to the apparatus of the invention. The light detector may be, for example, film, 
a photomultiplier tube, photodiode, avalanche photo diode, CCD or other light detector 
or camera. The light detector may be a single detector to detect sequential emissions, 
such as a scanning laser. Or, the light detector may include a plurality of separate 

25 detectors to detect and spatially resolve simultaneous emissions at single or multiple 

wavelengths of emitted light. The light emitted and detected may be visible light or may 
be emitted as non- visible radiation such as infrared or ultraviolet radiation. A thermal 
detector may be used to detect an infrared emission. The detector or detectors may be 
stationary or movable. 

30 Illumination can be channeled to particles of interest within the array by 

means of lenses, mirrors and fiber optic light guides or light conduits (single, multiple, 
fixed, or moveable) positioned on or adjacent to at least one surface of the capillary 
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array. A detectable signal, such as emitted light or other radiation, may also be 
channeled to the detector or detectors by the use of such mechanisms. The photodetector 
can comprise a CCD, CID or an array of photodiode elements. Detection of a position 
of one or more capillaries having an optical signal can then be determined from the 
5 optical input from each element. Alternatively, the array may be scanned by a scanning 
confocal or phase-contrast fluorescence microscope or the like, where the array is, for 
example, carried on a movable stage for movement in a X-Y plane as the capillaries in 
the array are successively aligned with the beam to determine the capillary array 
positions at which an optical signal is detected. A CCD camera or the like can be used in 

10 conjunction with the microscope. The detection system can be a computer-automated 
for rapid screening and recovery. In one aspect, the system uses a telecentric lens for 
detection. The magnification of the lens can be adjusted to focus on a subset of 
capillaries in the capillary array. At one extreme, for instance, the detection system can 
have a 1 : 1 correlation of pixels to capillaries. Upon detecting a signal, the focus can be 

1 5 adjusted to determine other properties of the signal. Having more pixels per capillary 
allows for subsequent image processing of the signal. 

Where a chromogenic substrate is used, the change in the absorbance 
spectrum can be measured, such as by using a spectrophotometer or the like. Such 
measurements are usually difficult when dealing with a low-volume liquid because the 

20 optical path length is short. However, the capillary approach of the present invention 
permits small volumes of liquid to have long optical path lengths (e.g., longitudinally 
along the capillary tube), thereby providing the ability to measure absorbance changes 
using conventional techniques. 

A fluid within a capillary will usually form a meniscus at each end. Any 

25 light entering the capillary will be deflected toward the wall, except for paraxial rays, 
which enter the meniscus curvature at its center. The paraxial rays create a small bright 
spot in middle of capillary, representing the small amount of light that makes it through. 
Measurement of the bright spot provides an opportunity to measure how much light is 
being absorbed on its way through. In one aspect, a detection system includes the use of 

30 two different wavelengths. A ratio between a first and a second wavelength indicates 
how much light is absorbed in the capillary. Alternatively, two images of the capillary 
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can be taken, and a difference between them can be used to ascertain a differential 
absorbance of a chemical within the capillary. 

In absorbance detection, only light in the center of the lumen can travel 
through the capillary. However, if at least one meniscus is flattened, the optical 
5 efficiency is improved. The meniscus can be kept flat under a number of circumstances, 
such as during a continuous cycle of evaporation, discussed above with reference to 
Figure 1 1 . In that aspect, the fluid bath can be contained in a clear, light-passing 
container, and the light source can be directed through the fluid bath into the capillary. 

In another aspect, bioactivity or a biomolecule or compound is detected 

10 by using various electromagnetic detection devices, including, for example, optical, 
magnetic and thermal detection. In yet another aspect, radioactivity can be detected 
within a capillary tube using detection methods known in the art. The radiation can be 
detected at either end of the capillary tube. Other detection modes include, without 
limitation, luminescence, fluorescence polarization, time-resolved fluorescence. 

1 5 Luminescence detection includes detecting emitted light that is produced by a chemical 
or physiological process associated with a sample molecule or cell. Fluorescence 
polarization detection includes excitation of the contents of the lumen with polarized 
light. Under such environment, a fluorophore emits polarized light for a particular 
molecule. However, the emitting molecule can be moving and changing its angle of 

20 orientation, and the polarized light emission could become random. 

Time-resolved fluorescence includes reading the fluorescence at a 
predetermined time after excitation. For a relatively long-life fluorophore, the molecule 
is flashed with excitation energy, which produces emissions from the fluorophore as well 
as from other particles within the substrate. Emissions from the other particles causes 

25 background fluorescence. The background fluorescence normally has a short lifetime 
relative to the long-life emission from the fluorophore. The emission is read after 
excitation is complete, at a time when all background fluorescence usually has short 
lifetime, and during a time in which the long-life fluorophores continues to fluoresce. 
Time-resolved fluorescence are therefore a technique for suppressing background 

30 fluorescent activity. 

Recovery of putative hits (cells or clones producing a detectable or 
optical signal) can be facilitated by using position feedback from the detection system to 
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automate positioning of a recovery device (e.g., a needle pipette tip or capillary tube). 
Figure 16 shows an example of a recovery system (100) of the invention. In this 
example, a needle 105 is selected and connected to recovery mechanism (106). A 
support table (102) supports a capillary array (10) and a light source (104). The light 

5 source is used with a camera assembly (1 10) to find an X, Y and Z coordinate location of 
a needle (105) connected to the recovery mechanism (106). The support table is moved 
relative to the capillary array in the X and Y axes, in order to place the capillary array 
(10) underneath the needle (105), where the capillary array (10) contains a "hit." 
According to various aspects, each section of a recovery system can be moved or kept 

10 stationary. 

The recovery mechanism (106) then provides a needle (105) to a 
capillary containing a "hif ' by overlapping the tip of the needle (105) with the capillary 
containing the "hit," in the Z direction, until the tip of the needle engages the capillary 
opening. In order to avoid damage to the capillary itself the needle may be attached to a 

1 5 spring or be of a material that flexes. Once in contact with the opening of the capillary 
the sample can be aspirated or expelled from the capillary. Alternatively, the capillary 
array may be moved relative to a stationary needle (105), or both moved. 

In a specific exemplary aspect of a recovery technique, a single camera is 
used for determining a location of a recovery tool, such as the tip of a needle, in the Z- 

20 plane. The Z-plane determination can be accomplished using an auto-focus algorithm, 
or proximity sensor used in conjunction with the camera. Once the proximity of the 
recovery tool in Z is known, an image processing function can be executed to determine 
a precise location of the recovery tool in X and Y. In one aspect, the recovery tool is 
back-lit to aid the image processing. Once the X and Y coordinate locations are known, 

25 the capillary array can be moved in X and Y relative to the precise location of the 

recovery tool, which can be moved along the Z axis for coupling with a target capillary. 

In an alternative specific aspect of a recovery technique, two or more 
cameras are used for determining a location of the recovery tool. For instance, a first 
camera can determine X and Z coordinate locations of the recovery tool, such as the X, Z 

30 location of a needle tip. A second camera can determine Y and Z coordinate locations of 
the recovery tool. The two sets of coordinates can then be multiplexed for a complete 
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X,Y,Z coordinate location. Next, the movement of the capillary array relative to the 
recovery tool can be executed substantially as above. 

The sample can be expelled by, for example, injecting a blast of inert gas 
or fluid into the capillary and collecting the ejected sample in a collection device at the 
5 opposite end of the capillary. The diameter of the collection device can be larger than or 
equal to the diameter of the capillary. The collected sample can then be further 
processed by, for example, extracting polynucleotides, proteins or by growing the clone 
in culture. 

In another aspect, the sample is aspirated by use of a vacuum. In this 

10 aspect, the needle contacts, or nearly contacts, the capillary opening and the sample is 
"vacuumed" or aspirated from the capillary tube onto or into a collection device. The 
collection device may be a microfiige tube or a filter located proximal to the opening of 
the needle, as depicted in Figure 17A-D. Figure 17D shows further processing of a 
sample collected onto a filter following aspiration of the sample from the capillary. The 

1 5 sample includes particles, such as cells, proteins, or nucleic acids, which when present 
on the filter, can be delivered into a collection device. Suitable collection devices 
include a microfiige tube, a capillary tube, microtiter plate, cell culture plate, and the 
like. The delivery of the sample can be accomplished by forcing another media, air or 
other fluid through the filter in the reverse direction. 

20 The sample can also be expelled from a capillary by a sample ejector. In 

one aspect, the ejector is a jet system where sample fluid at one end of the capillary tube 
is subjected to a high temperature, causing fluid at the other end of the capillary tube to 
eject out. The heating of fluid can be accomplished mechanically, by applying a heated 
probe directly into one end of a capillary tube. The heated probe preferably seals the one 

25 end, heats fluid in contact with the probe, and expels fluid out the other end of the 

capillary tube . The heating and expulsion may also be accomplished electronically. For 
instance, in an aspect of the jet system, at least one wall of a capillary tube is metalized. 
A heating element is placed in direct contact with one end of the wall. The heating 
element may completely close off the one end, or partially close the one end. The 

30 heating element charges up the metalized wall, which generates heat within the fluid. 
The heating element can be an electricity source, such as a voltage source, or a current 



83 



09010-400001 (DIVER 1280-36) 

source. In still yet another aspect of a jet system, a laser applies heat pulses to the fluid 
at one end of the capillary tube. 

Other systems for expelling fluid from a capillary tube of the invention 
are possible. An electric field may be created in or near the fluid to create an 

5 electrophoretic reaction, which causes the fluid to move according to electromotive force 
created by the electric field. A electromagnetic field may also be used. In one aspect, 
one or more capillaries contain, in addition to the fluid, magnetically charged particles to 
help move the fluid or magnetized particles out of the capillary array. Each capillary of 
an array of capillaries is individually addressable, i.e. the contents of each well can be 

10 ascertained during screening. In one aspect, a quantum-dot-tagged microbead method 
and arrangement is used. In such a method and arrangement, tens of thousands of 
unique fluorescent codes can be generated. The assay of interest is attached to a coded 
bead, and multi-spectral imaging is used to measure both the assay and the beads/codes 
simultaneously. There will always be some capillaries that get multiple beads and some 

15 that get none. 

For an array which contains approximately 100,000 capillaries, one 
approach is to fill the 100,000 capillaries of the array with a solution that contains 10 
copies of 1 0,000 different coded beads (or 5 copies of 20,000 codes). Under normal 
conditions, simple statistical analysis can be used to determine which of the wells have 
20 single beads and maybe even the contents of every well. The chance of having any two 
beads together in a well more than 5 times on any one capillary array platform is 
negligibly small. 

An advantage of the quantum-dots method is that only a single excitation 
band is needed. This allows a lot of flexibility for the assay (i.e. it can use a different 
25 excitation band). Magnetic-coded beads may also be used to add another dimension to 
the assay detection. A multi-spectral imaging system can then be used. Alternatively, a 
neural network application can be utilized for spectral decomposition. 

The myriad of microbes inhabiting this planet represent a tremendous 
repository of biomolecules for pharmaceutical, agricultural, industrial and chemical 
30 applications. The great majority of these microbes, estimated at near 99.5%, have 
remained uncultured by modern microbiological methods due in large part to the 
complex chemistries and environmental variables encountered in extreme or unusual 
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biotopes. Taking advantage of enzymes catalyzing chemical reactions in novel 
pathways and evolved to function under environmental extremes is of great industrial 
significance. This invention provides technologies to extract, optimize and 
commercialize this robust catalytic diversity, within culture-independent, recombinant 
5 approaches for the discovery of novel enzymes and biosynthetic pathways by tapping 
into the biodiversity present in nature. Large, complex (>109 member) gene libraries are 
constructed by direct isolation of DNA from selected microenvironments around the 
world. These libraries are then expressed in various host systems and subjected to high 
throughput screens specific for an activity of interest. Because in excess of 5000 
10 different microbial genomes may be present in a single DNA library, ultra high 

throughput methods are required to effectively screen this diversity and are crucial to the 
success of this culture- independent, recombinant strategy. 

The invention provides screening platforms and methods for use with a 
Fluorescence Activated Cell Sorter (FACS). In FACS methodologies, cells are mixed 
1 5 with substrates and then streamed past a detector to screen for a positive molecular 

event. This signal could be a fluorescent signal resulting from the cleavage of an enzyme 
substrate or a specific binding event. The greatest advantage of the use of a FACS 
machine is throughput; up to 109 clones can be screened/day. Unfortunately, FACS 
based screening also has limitations including cell wall permeability of enzymes and 
20 substrates/products and incubation times and temperatures. In addition, viability of host 
cells post-sort and dependence on a single data point for each individual cell further limit 
such technologies. 

The development of the capillary array overcomes many of these 
shortcomings. Like microtiter and solid phase screens, it combines the preservation of 
25 native protein conformation with increased signal strength of clonal amplification. The 
throughput, however, approaches that of selective assays and FACS-based assays. 
Moreover, as array plates are reusable, the amount of plastic waste generated is greatly 
reduced. Approximately 24 tons of plastic waste* is generated annually in screening 
100,000 wells per day in a 96 well format (* Assuming 84g/plate x 1000 plates/day x 
30 260 days/year). Further, a typical screen of 100,000 wells on a robotic high throughput 
screening system requires 261 384-well microtiter plates and over 24 hours of equipment 
time versus less than 10 minutes to process a single plate. The enhancement of this 
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technology to densities of one million wells per plate is aimed at approaching the 
throughput of selective assays and FACS-based assays while retaining the advantages of 
a microtiter-based screen. 

The first generation capillary array plates can be fabricated using 
5 manufacturing techniques originally developed for the fiber optics industry, currently 
consist of 100,000 cylindrical compartments or wells contained within a 33" x 5" 
reusable plate, the size of a SBS (Society for Biomolecular Screening) standard 96 well 
microtiter plate. These wells are 200 urn in diameter (about the diameter of a human 
hair) and act as discrete 250 nanoliter volume microenvironments in which isolated 

1 0 clones can be grown and screened. 

The processes involved in array screening closely parallel those in 
microtiter plate screening, but with significant simplification in required instrumentation 
and decrease in plate storage capacity requirements and reagent costs. Briefly, the plates 
are filled with clones and reagents (e.g. fluorescent substrate, growth media, etc.) by 

1 5 surface tension, filling all 100,000 wells simultaneously within a few seconds without 
the need for complicated dispensing equipment. The number of clones per well, 
typically 1 to 10, is adjusted by dilution of the cell culture. Once filled, the plates are 
then incubated in a humidity-controlled environment for 24 to 48 hours to allow for both 
clonal amplification and enzymatic turnover. 

20 After incubation in a humidified chamber, the plates are transferred to the 

detection and recovery station where fluorescence imaging is used to detect the 
expression of bioactive molecules. The automated detection and recovery system 
combines fluorescence imaging and precision motion control technologies through the 
use of machine vision and image processing techniques. Images are generated by 

25 focusing light from a broadband light source (e.g. metal halide arc lamp) onto the plate 
through a set of fluorescence excitation filters. The resulting fluorescence emission is 
filtered then imaged by a telecentric lens onto a high-resolution cooled CCD camera in 
an epi-fluorescent configuration. The plates are scanned to generate a total of 56 slightly 
overlapping images in approximately one minute. The images are digitized and 

30 processed on-the-fly to detect and locate positive wells or putative hits. Putative hits 
(clones that have converted the substrate to a fluorescent product) appear as bright spots 
on a dark background. They are distinguished from background fluorescence and 
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extraneous signals (typically due to dirt and dust) based on a variety of feature 
measurements such as their shape, size, and intensity profile. 

Once detected and located, putative hits are recovered from the array 
plate and transferred to a standard microtiter plate for confirmation and secondary 
5 screening. The process of recovery consists of: 1) mounting and locating a sterile 

recovery needle (typically a standard blunt end stainless steel needle commonly used for 
dispensing adhesives for mounting miniature surface mount electronic components), 2) 
aligning the recovery needle to the well containing the putative hit, 3) aspirating the 
contents of the well into the needle (which has attached .22 micron filter to avoid 

10 upstream contamination and loosing the sample), 4) flushing the well contents into a 
standard microtiter plate with an appropriate media, and finally 5) stripping off the 
recovery needle in preparation for the next recovery. Closed loop positioning with 
image-based feedback provides the positional accuracy required to allow aspiration of 
individual wells without contamination from neighboring wells. Finally, after the clones 

15 of interest have been recovered, the used plates are cleaned, sterilized, and prepared for 
re-use. The array platform according to the invention will accelerate the discovery and 
development of commercial products as well as enable the development of products that 
would otherwise be unobtainable. 

This invention is configured for use with a Fluorescence Activated Cell 

20 Sorter (FACS). In FACS methodologies, cells are mixed with substrates and then 

streamed past a detector to screen for a positive molecular event. This signal could be a 
fluorescent signal resulting from the cleavage of an enzyme substrate or a specific 
binding event. The greatest advantage of the use of a FACS machine is throughput; up to 
109 clones can be screened/day. Unfortunately, FACS based screening also has 

25 limitations including cell wall permeability of enzymes and substrates/products and 
incubation times and temperatures. In addition, viability of host cells post-sort and 
dependence on a single data point for each individual cell further limit such 
technologies. 

The well diameter, plate thickness (well depth), and material optical 
30 properties will be specified prior to fabricating the new 1,000,000-well density matrices. 
Once these parameters are specified, high density matrices will be fabricated in 
rectangular pieces approximately 1cm square. The process entails a low-risk 
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modification to the same basic fabrication technique that is used to make the 100,000 

well plates. The array density can be calculated by using the following formula: 

2 (PlateLength x PlateWidth) 

WellsPerPlate - -j= \^ ellDiameter + WellSeparationWallf 

This calculation reveals that in order to achieve 1,000,000 wells in the 
standard 3.3" x 5" microtiter plate format, the new wells will need to have a diameter of 
approximately 70 |im with 25^m separating walls. Structures of this size/density and 
smaller (down to 6pm) are commonly manufactured for non-biological uses including 
micro-channel faceplates for intensified CCD cameras, X-ray scintillation plates, optical 
collimators, as well as simple fluid filters. 

There are some limitations to the depth of the wells due to the nature of 
the fabrication process. The current 100,000-well plates have 8mm deep wells. Based 
on our experience with structures of similar size, it is estimated that the depth of the 
70fim wells will be between 5mm and 8mm. This yields a well volume of 
approximately 25nl to 30nl or approximately l/10th of that of the 200jam diameter wells. 
Evaporation rate is a function of the surface area to volume ratio rather than the total 
volume. For this reason it is anticipated that the 70|im wells will experience comparable 
(if not less) evaporation than the 200|am well due to a more favorable length to diameter 
(volume to surface area) ratio. Evaporation is currently not a problem with the 200|am 
diameter wells. 

Samples will be constructed from both transparent and opaque materials 
to evaluate illumination efficiencies, well-to-well optical cross-talk, surface-finish 
effects, and background fluorescence. The current 100,000-well plates use an opaque 
material. The use of transparent materials improves the efficiency of fluorescence 
excitation at the expense of increased well-to-well optical cross-talk. For assays with 
low hit rates, the tradeoff may favor the use of transparent materials to improve detection 
sensitivity. We estimate that the specification and manufacturing process will take two 
months. A special holder will also be fabricated to adapt the matrices to the capillary 
array hardware. Once the specified matrices are manufactured, they will be tested for 
each of the optical and mechanical properties detailed below: 

Background Fluorescence - It is helpful from an imaging and processing 
perspective, but not critical, that the matrix have low background fluorescence for a 
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broad range of excitation wavelengths to allow use with a variety of substrates. The 
materials used in the 200|am plates were tested and selected to satisfy this requirement. 
In the unlikely event that different materials must be used to fabricate both transparent 
and opaque 70|im matrices, they will be tested for their fluorescent properties prior to 
5 fabrication. These tests are performed by measuring and comparing the fluorescence of 
the material to a reference standard at a range of excitation wavelengths. 

Optical Efficiency - The 100,000-well plates are currently illuminated by 
a roughly collimated beam directly on the face of the plate. Light enters each well 
through the aperture formed by the wall around the well. Transparent materials are 

1 0 expected offer illumination advantages over opaque materials with the current 

illumination system by transmitting additional excitation energy through the walls 
separating the wells. The optical efficiency of the 1 ,000,000-well density matrices will 
be evaluated by determining the detectable concentration of a fluorescein solution. 
Typically, liquid phase enzyme discovery assays use 10-100jaM concentrations of 

1 5 fluorescent substrate. The current detection system can detect approximately lOnM of 
fluorescein in the 200|am wells. The equivalent fluorescence of LB (our typical cell 
growth media) is approximately 25nM. Hardware modifications described in Goal 3 may 
be required in the unlikely event that the detectable levels are less than 10(oM for the 
new matrices. 

20 Optical Cross-talk - While the use of transparent materials may improve 

the efficiency of fluorescence excitation as described above, it does so at the expense of 
increased well-to-well optical cross-talk. This optical cross-talk is due to fluorescence 
emission that leaks from one well into its neighbors. This is easily quantified by, 
spotting a fluorophore onto the matrix, and then measuring the signal intensity vs. 

25 distance from a fluorophore filled well. The cross-talk could potentially mask the signal 
of a weak positive well resulting in a false negative or be detected as a false positive. In 
applications where the expected hit rate is low (which is commonly the case with 
enzyme discovery from environmental libraries) the probability of this occurring is 
generally insignificant. However, cross-talk can complicate the image processing 

30 required to automatically locate putative hits and therefore must be evaluated. 

Surface Tension/Wicking Properties - The plates are filled by placing the 
surface of the plate in contact with the assay solution. Surface tension at the liquid/plate 
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interface causes the assay components to be drawn or wick into all of the wells 
simultaneously. The surface preparation of the plate can have significant affects on the 
wicking properties of the matrix. Some surface polishing techniques have been found to 
make the glass face of the plate hydrophobic, thus preventing or significantly slowing 

5 the filling of the plate. Initially, the same surface finish currently used on the 1 00,000- 
well plate will be tested. If necessary, matrices with different surface preparations will 
be placed into contact with a cell/media mixture and their wicking properties quantified 
by timing the filling process and weighing the matrices before and after filling. In the 
event that plate filling remains inadequate after testing available surface preparations and 

1 0 treatments, surfactants can be added to improve filling. 

Resistance to Cleaning and Sterilization - It is desirable for the 
1,000,000-well plates to be reusable. To validate this requirement, the matrices will be 
processed through multiple, rigorous cleaning and sterilization protocols. Currently, 
there is a great deal of latitude in both the cleaning and sterilization protocols. Cleaning 

1 5 can consist of a combination of flushing, soaking, and/or sonication in water, solvents 
and/or soaps. Likewise, due to the inherent ruggedness of the materials used, 
sterilization can be accomplished by autoclaving, bleach, ethanol, and/or acid washing. 
Cleanliness is verified by fluorescence imaging of the material at multiple excitation 
wavelengths. Sterilization is verified by overnight incubation of matrices filled with 

20 sterile growth media, followed by plating the contents onto agar and looking for colony 
formation. 

Only minimal modifications to the detection system hardware will be 
required for the 1,000,000-well density matrices. Due to reduced size of the wells, 
minor modifications to the optical system may need to be made to adjust the 

25 magnification to an appropriate level to determine screening feasibility. The optical 

system will likely need further modification as proposed in Phase II to enable automated 
hit recovery. A commercially available 2x extender can be added to the existing 
telecentric imaging lens used for the current 100,000-well plate. This modification will 
render the final image size of each well (relative to the camera) approximately 70% of 

30 the current size. Based on our experience, this should be more than adequate to visualize 
positive wells for determining feasibility. 
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As mentioned above, the detection sensitivity of the new matrices is 
expected to be lower (especially for opaque matrices) than for the current plates using 
the current detection system hardware. In addition to the use of transparent matrices, a 
number of hardware enhancements that could significantly improve sensitivity 
5 including: Higher sensitivity cooled CCD camera; Laser based illumination or other 
higher power density light source; and Faster (possibly non-telecentric) imaging optics. 

In order to fully take advantage of the throughput afforded by 1 ,000,000 
well plates, a large number of unique clones must be generated. Two alternative 
methods for preparing large numbers (10 7 to 10 9 ) of clones per day for screening can be 

1 0 used with the 100,000-well plates. They will both be tested for use with the 1 ,000,000- 
well density matrices and are described below. One effort will use Resorufin 0-D- 
galactopyranoside (Molecular Probes #R-1 159) as the fluorescent substrate and a 
positive P-galactosidase control clone (535-GL2) for both assay development and 
feasibility screening. This substrate and positive clone were well characterized and 

1 5 validated during the development of the 100,000-well platform. 

Method 1 : Screening Lambda Phage Libraries for Enzymatic Activity - 
Gene libraries cloned into lambda-based vectors are first titered by plating dilutions on 
soft agar in the presence of an appropriate E. coli host strain according to standard 
techniques. Using this titer information, an adequate amount of the lambda library is 

20 allowed to adsorb to the host. After 1 5 minutes, a mixture of growth medium and 
fluorescent substrate is then added to produce a final suspension having the following 
characteristics: [1] a density of host cells that will allow both sufficient growth and an 
effective multiplicity of infection, [2] an optimal concentration of fluorescent substrate 
for detection of the enzymatic activity, and [3] a density of phage particles such that, 

25 when loaded into a 1 ,000,000- well density matrix, each well will contain an average of 1 
- 4 library clones. (Densities of 5-10 clones per well will be attempted once the initial 
details are worked out.) A sample of this suspension is plated on soft agar to determine 
the average seed density of library clones (concomitant titer). The remainder of the 
suspension is used to load the wells of the matrices. The plates are incubated at 37°C for 

30 16-24 hours (protected from light and evaporative loss; see note on Incubation below) to 
allow lytic multiplication of bacteriophage in the wells prior to detection and recovery. 
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Method 2: Screening Phagemid and Other Colony-Based Libraries for 
Enzymatic Activity - Phagemid libraries are produced from parental bacteriophage 
libraries using an in vivo excision process (Short et al., 1988). Following initial titering, 
these libraries are used to infect an appropriate E. coli host strain. After the 15-minute 
5 adsorption period, cells are supplied with a small amount of medium and allowed to 
grow at 30 degrees Celsuis without antibiotic selection for 45 minutes to allow 
expression of the antibiotic resistance gene present on the phagemid. The suspension is 
then plated onto solid plates containing antibiotic and allowed to grow at 30 degrees 
Celsius overnight. Amplified clones from the resulting antibiotic-resistant colonies are 

10 collected into a pooled suspension. A mixture of antibiotic, fluorescent substrate and 
growth medium is then added to produce the final suspension used to load the high- 
density matrices (with characteristics analogous to [2] and [3] above). A sample of this 
suspension is also plated onto solid agar plates containing antibiotic to determine the 
average seed density of library clones (concomitant titer). The matrices are then 

1 5 incubated at 30-37 degrees C for 1-2 days (protected from light and evaporative loss; see 
note on Incubation below) to allow phagemid-containing host cells to multiply within the 
wells prior to detection and recovery. 

Libraries created in other vectors (e.g. cosmid, fosmid, PAC, YAC, 
BAC, etc.) are also screened using this platform. Factors such as growth requirements, 

20 transformation modality, and transformation efficiency have to be taken into 

consideration when adapting a particular library vector to this technology. The use of a 
variety of library and vector types permits screening for small molecules and protein 
therapeutics in addition to novel enzymes. 

The array plates are typically incubated in a humidified incubator at 90% 

25 relative humidity for 24 to 48 hours. The plates are stackable and designed such that 
each plate is contained within a humidity and temperature stable environment by the 
plates above and below it. Lids or extra plates filled with water are used at the top and 
bottom of each stack to seal the end plates. The incubation process requires validation of 
cell growth, evaporation, and condensation. 

30 The growth of E. coli, which will be used as the enzyme screening host, 

has been clearly demonstrated in the 100,000 well array plate. Other types of cells 
including Streptomyces, mammalian (Jurkat human leukemic T cells), and lambda phage 
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have also been shown to grow in this format. Cell growth in the 1,000,000-well density 
matrices will be verified by the same procedure used in for the 100,000-well plates. The 
number of colonies formed by plating the initial cell solution (diluted to 1 to 10 
clones/well) will be compared to a culture of equal volume aspirated from the matrix 

5 after incubation. Although difficulties in cell growth are not anticipated, there are 
alternative strategies to mitigate these difficulties. The surface area to volume ratio of 
the 1,000,000-well density matrices is less favorable for oxygen diffusion into the assay 
solution than in the 100,000-well format. If oxygen diffusion appears to be limiting cell 
growth, we will evaluate methods for increasing oxygenation. Preliminary experiments 

10 have successfully demonstrated fluidic mixing in 200jun diameter wells using 

paramagnetic beads in a fluctuating magnetic field and by agitation with sound pulses. 
Magnetic mixing has been shown to vastly improve the growth of Streptomyces in the 
100,000-well format. 

If necessary, these mixing methods could be employed to improve 

1 5 oxygen diffusion and cell growth. Other methods include oxygen saturation of the assay 
solution prior to plate filling, incubation in a high oxygen environment, and the addition 
of time-released oxygen generating compounds such as sodium percarbonate. With a 
total assay volume of approximately 30nl, controlling evaporation from the 1,000,000- 
well plates will be critical. However, as mentioned above, the surface to volume ratio is 

20 favorable for minimizing evaporation. Evaporation studies conducted in 100,000-well 
plates indicate a 10% loss of media volume over 24 hours. This loss is reduced to 5% 
with the addition of 10% glycerol. Because the surface area to volume ratio of the 
1,000,000-well plates will be similar (if not more favorable) to the 100,000-well plates. 
Evaporation in the higher density matrices will be measured by filling the plates with 

25 typical assay media and weighing them at several time points over a 96-hour period. If 
stricter evaporation control is required, glycerol can be added. 

The effects of condensation/moisture on the surface of the matrices are 
also considered. Because they are incubated in high-humidity environments, droplets on 
the outer surfaces of the matrices that remain after filling or condense during incubation 

30 may not evaporate and can cause well to well cross-contamination. These droplets can 
lead to the detection of false positives in wells neighboring a true positive as well as 
cause a blotchy appearance on the plate surface that obscures weak positives. Such 
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problems with surface droplets remaining after filling the 100,000-well plates are 
avoided by letting them sit at room temperature until all of the surface moisture has 
evaporated. Avoiding condensation during incubation is accomplished by using strict 
temperature and humidity control. This issue is addressed by placing the filled plates in 
5 a programmable humidified chamber that starts with low humidity and increases it to the 
desired incubation humidity only after the plates have wanned to the chamber 
temperature. Once warm, the stacked plates form a relatively stable thermal mass 
immune to the small temperature fluctuations in the chamber. Surface moisture control 
issues will be similar in the higher density plates. The matrices will be tested to see if 

10 these methods successfully control surface moisture. 

Negative libraries spiked with the positive (3 -gal clone at a defined 
frequency will be the first subjects of a feasibility screen. The same screen will be 
performed in parallel in a conventional microtiter format for comparison. Once this is 
proven, screening will proceed (again in parallel with microtiter format) to libraries 

1 5 known to contain positive clones. A mixed population library was validated for this 
purpose during the development of the 100,000-well platform and will be used for the 
1,000,000-well feasibility screening. These experiments will be performed for both 
lambda-based and phagemid-based library screens since clonal amplification rates, and 
thus signal intensities, may differ between bacteriophage and whole cell assays. 

20 Validation of the feasibility screens can be performed by simply 

comparing the number of positive wells in the fluorescence images of the 1,000,000-well 
matrices to those in a 100,000-well array plate filled with the identical assay solution. 

Further verification will be done in standard microtiter format. The 
number of positive wells is a function of the concentration of positive clones in the initial 

25 assay solution and the volume of the wells. Since the well volume of the 1,000,000-well 
matrices is approximately 1/1 0th that of the 100,000 well plates, the expected number of 
positive wells should also be about l/10th when loading the same initial assay solution. 

The array of capillaries can be arranged to fit within a footprint of a 
microtiter plate, one standard of which is a footprint of 3.3" x 5". Within that footprint, 

30 up to 1 ,000,000 or more capillaries, or wells, can be provided in the array. A 1 ,000,000 
well platform for screening gene libraries from mixed populations of organisms for 
novel enzymatic activities provides an ultra high-throughput screening platform in the 
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3.3" x 5" footprint of a standard microtiter plate. In this format each well includes a 
capillary having a diameter of 200|jm, and which holds 250nl. The array platform 
permits rapid screening of genes and gene pathways, and increases the productivity of 
discovery and gene optimization programs for products such as novel enzymes, protein 
5 therapeutics, compounds and small molecule drugs. Any number of novel enzymes of 
various catalytic classes (e.g., amylases, proteases, secondary amidases) can be 
discovered using the array platform. The same proprietary cost effective process by 
which the 100,000-well plates are made can be utilized to make the 1,000,000-well 
plates for smaller, non-biological applications. 

1 0 The array screening platform greatly expands the amount of molecular 

diversity that can be screened to discover new products. Using 1,000,000-well plates, 
employing over 12,000 wells per square centimeter, more than one billion clones per day 
can be screened using standard liquid phase fluorescent assays, while at the same time 
reducing equipment and operator time through massively parallel dispensing and reading 

1 5 of biological samples. Additionally, the 1 ,000,000-well plates, with wells each about 
half the diameter of a human hair, are be reusable and require only miniscule volumes of 
reagents, making them highly cost effective and environmentally responsible. 

Increasing the liquid phase screening density from 100,000 to 1,000,000 
wells per microtiter plate footprint represents a lOx increase in density that contributes to 

20 accelerated discovery and development of commercial products, such as antibody and 
protein therapeutic programs that require rapid screening of very large numbers of 
antibody and protein variants created by evolution technologies. This invention includes 
the design and fabrication of 1cm square matrices with 1,000,000 well/plate density (i.e. 
12,000 wells/cm2) using a process that is scalable to full microtiter plate sized arrays. 

25 The platform can be utilized to develop a novel liquid phase nitrilase 

assay in the 1,000,000-well format, as well as screening gene libraries from mixed 
populations of organisms for chiral nitrilases for use in the manufacture of chemical 
intermediates for chiral therapeutic compounds. 

Naked Biopanning involves the direct screening or enrichment for a 

30 gene or gene cluster from environmental genomic DNA. The enrichment for or 

isolation of the desired genomic DNA is performed prior to any cloning, gene-specific 
PCR or any other procedure that may introduce unwanted bias affecting downstream 
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processing and applications due to toxicity or other issues. Several methodologies 
can be described for this type of sequence based discovery. These generally include 
the use of nucleic acid probe(s) that is(are) partially or completely homologous to the 
target sequence in conjunction with the binding of the probe-target complex to a solid 
5 phase support. The probe(s) may be polynucleotide or modified nucleic acid, such as 
peptide nucleic acid (PNA) and may be used with other facilitating elements such as 
proteins or additional nucleic acids in the capture of target DNA. An amplification 
step which does not introduce sequence bias may be used to ensure adequate yield for 
downstream applications. 

1 0 An example of a Naked Biopanning approach can be found in the use 

of RecA protein and a complement-stabilized D-loop (csD-loop) structure (Jayasena 
& Johnston, 1993; Sena and Zarling, 1993) to target genomic DNA of interest. It 
does not involve complete denaturation of the target DNA and therefore is of 
particular interest when one is attempting to capture large genomic fragments. The 

1 5 following method incorporates the ClonCapture™ cDNA selection procedure 

(CLONTECH Laboratories, Inc.), with some modification, to take advantage of csD- 
loop formation, a stable structure which may be used to capture genomic DNA 
containing an internal target sequence: 

Environmental genomic DNA is cleaved into fragments (fragment size 

20 depends upon type of target and desired downstream insert size if making a pre- 

enriched library) using mechanical shearing or restriction digest. Fragments are size 
selected according to desired length and purified. A biotinylated dsDNA probe is 
produced, based upon existing knowledge of conserved regions within the target, by 
PCR from a positive clone or by synthetic means. The probe can be internally (ex. 

25 incorporation of biotin 2 1 -dCTP) or end labeled with biotin. It must be purified to 
remove any unincorporated biotin. The probe is heat denatured (5 min. at 95°C) and 
placed immediately on ice. The denatured probe is then reacted with RecA and an 
ATP mix containing ATP and a nonhydrolyzable analog (15 min. at 37°C). The target 
DNA is added and incubated with the RecA/biotinylated probe microfilaments to 

30 form the csD-loop structure (20 min. at 37°C). The RecA is then removed by 
treatment with proteinase K and SDS. After inactivating the proteinase K with 
PMSF, washed and blocked (with sonicated salmon sperm DNA) streptavidin 
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paramagnetic beads are transferred to the reaction and incubated to bind the csD-loop 
complex to the support (rotate 30 min. at room temp.). The unbound DNA is 
removed and may be saved for use as target for a different probe. The beads are 
thoroughly washed and the enriched population is eluted using an alkaline buffer and 
5 transferred off. The enriched DNA is then ethanol precipitated and is ready for 
ligation and pre-enriched library preparation. 

Other stable complexes may be used instead of the RecA/csD-loop 
structure for the capture of genomic DNA. For instance, PNAs may be used, either as 
"openers" to allow insertion of a probe into dsDNA (Bukanov et al., 1998), or as 

10 tandem probes themselves (Lohse et al., 1999). In the first case, PNAs bind to two 
short tracts of homopurines that are in close proximity to each other. They form P- 
loop structures, which displace the unbound strand and make it available for binding 
by a probe, which can then be used to capture the target using an affinity capture 
method involving a solid phase. Likewise, PNAs may be used in a "double-duplex 

1 5 invasion" to form a stable complex and allow target recovery. 

Simpler methods may be used in the retrieval of targets from 
environmental genomic DNA that involve complete denaturation of the DNA 
fragments. After cutting genomic DNA into fragments of the desired length via 
mechanical shearing or through the use of restriction enzymes, the target DNA may 

20 be bound to a solid phase using a direct hybridization affinity capture scheme. A 
nucleic acid probe is covalently bound to a solid phase such as a glass slide, 
paramagnetic bead, or any type of matrix in a column, and the denatured target DNA 
is allowed to hybridize to it. The unbound fraction may be collected and re- 
hybridized to the same probe to ensure a more complete recovery, or to a host of 

25 different probes, as a part of a cascade scenario, where a population of environmental 
genomic DNA is subsequently panned for a number of different genes or gene 
clusters. 

Linkers containing restriction sites and sites for common primers may 
be added to the ends of the genomic fragments using sticky-ended or blunt-ended 
30 ligations (depending upon the method used for cutting the genomic DNA). These 
enable one to amplify the size-selected inserted fragment population by PCR without 
significant sequence bias. Thus, after using any of the abovementioned techniques for 
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isolation or enrichment, one may help to ensure adequate recovery for downstream 
processing. Furthermore, the recovered population is ready for cutting and ligation 
into a suitable vector as well as containing the priming sites for sequencing at any 
time. 

5 A variation of the above scheme involves including a tag from a 

combinatorial synthesis of polynucleotide tags (Brenner et al., 1999) within the linker 
that is attached onto the ends of the genomic fragments. This allows each fragment 
within the starting population to have its own unique tag. Therefore, when amplified 
with common primers, each of these uniquely tagged fragments give rise to a 

1 0 multitude of in vitro clones which are then bound to the paramagnetic bead containing 
millions of copies of the complementary, covalently bound anti-tag. A fluorescently 
labeled, target specific probe may be subsequently hybridized to the target-containing 
beads. The beads may be sorted using FACS, where the positives may be sequenced 
directly from the beads and the insert may be cut out and ligated into the desired 

1 5 vector for further processing. The negative population may be hybridized with other 
probes and resorted as part of the cascade scenario previously described. 

Transposon technology may allow the insertion of environmental 
genomic DNA into a host genome through the use of transposomes (Goryshin & 
Reznikoff, 1998) to avoid bias resulting from expression of toxic genes. The host 

20 cells are then cultured to provide more copies of target DNA for discovery, isolation, 
and downstream processes. 

Without further elaboration, it is believed that one skilled in the art 
can, using the preceding description, utilize the present invention to its fullest extent. 
The following examples are to be considered illustrative and thus are not limiting of 

25 the remainder of the disclosure in any way whatsoever. 
Examples 

Example 1 : DNA Isolation and Library Construction 

The following outlines the procedures used to generate a gene library from 
a mixed population of organisms. 
30 DNA isolation. DNA is isolated using the IsoQuick Procedure as per 

manufacturer's instructions (Orca, Research Inc., Bothell, WA). DNA can be 
normalized according to Example 2 below. Upon isolation the DNA is sheared by 
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pushing and pulling the DNA through a 25G double-hub needle and a 1-cc syringes 
about 500 times. A small amount is run on a 0.8% agarose gel to make sure the 
majority of the DNA is in the desired size range (about 3-6 kb). 

Blunt-ending DNA. The DNA is blunt-ended by mixing 45 ul of 10X 
5 Mung Bean Buffer, 2.0 ul Mung Bean Nuclease (150 u/ul) and water to a final 
volume of 405 ul. The mixture is incubate at 37°C for 15 minutes. The mixture is 
phenol/chloroform extracted followed by an additional chloroform extraction. One 
ml of ice cold ethanol is added to the final extract to precipitate the DNA. The DNA 
is precipitated for 10 minutes on ice. The DNA is removed by centrifugation in a 
10 microcentrifuge for 30 minutes. The pellet is washed with 1 ml of 70% ethanol and 
repelleted in the microcentrifuge. Following centrifugation the DNA is dried and 
gently resuspended in 26 ul of TE buffer. 

Methylation of DNA. The DNA is methylated by mixing 4 ul of 10X EcoR 
I Methylase Buffer, 0.5 ul SAM (32 mM), 5.0 ul EcoR I Methylase (40 u/ul) and 
1 5 incubating at 37°C, 1 hour. In order to insure blunt ends, add to the methylation 
reaction: 5.0 ul of 100 mM MgCl 2 , 8.0 ul of dNTP mix (2.5 mM of each dGTP, 
dATP, dTTP, dCTP), 4.0 ul of Klenow (5 u/ul) and incubate at 12°C for 30 minutes. 

After 30 minutes add 450 ul IX STE. The mixture is phenol/chloroform 
extracted once followed by an additional chloroform extraction. One ml of ice cold 
20 ethanol is added to the final extract to precipitate the DNA. The DNA is precipitated 
for 10 minutes on ice. The DNA is removed by centrifugation in a microcentrifuge 
for 30 minutes. The pellet is washed with 1 ml of 70% ethanol, repelleted in the 
microcentrifuge and allowed to dry for 10 minutes. 

Ligation. The DNA is ligated by gently resuspending the DNA in 8 ul 
25 EcoR I adaptors (from Stratagene's cDNA Synthesis Kit), 1 .0 ul of 10X Ligation 

Buffer, 1.0 ul of 10 mM rATP, 1.0 ul of T4 DNA Ligase (4Wu/ul) and incubating at 
4°C for 2 days. The ligation reaction is terminated by heating for 30 minutes at 70°C. 

Phosphorylation of adaptors. The adaptor ends are phosphorylated by 
mixing the ligation reaction with 1 .0 ul of 10X Ligation Buffer, 2.0 ul of lOmM 
30 rATP, 6.0 ul of H 2 0, 1 .0 ul of polynucleotide kinase (PNK) and incubating at 37°C 
for 30 minutes. After 30 minutes 31 ul H 2 0 and 5 ml 1 OX STE are added to the 
reaction and the sample is size fractionate on a Sephacryl S-500 spin column. The 

99 



09010-400001 (DIVER 1280-36) 

pooled fractions (1-3) are phenol/chloroform extracted once followed by an additional 
chloroform extraction. The DNA is precipitated by the addition of ice cold ethanol on 
ice for 10 minutes. The precipitate is pelleted by centrifugation in a microfuge at high 
speed for 30 minutes. The resulting pellet is washed with 1 ml 70% ethanol, 
5 repelleted by centrifugation and allowed to dry for 10 minutes. The sample is 
resuspended in 10.5 ul TE buffer. Do not plate. Instead, ligate directly to lambda 
arms as above except use 2.5 ul of DNA and no water. 

Sucrose Gradient (2.2 ml) Size Fractionation. Stop ligation by heating the 
sample to 65°C for 10 minutes. Gently load sample on 2.2 ml sucrose gradient and 

10 centrifuge in mini-ultracentrifuge at 45K, 20°C for 4 hours (no brake). Collect 
fractions by puncturing the bottom of the gradient tube with a 20G needle and 
allowing the sucrose to flow through the needle. Collect the first 20 drops in a Falcon 
2059 tube then collect 10 1-drop fractions (labeled 1-10). Each drop is about 60 ul in 
volume. Run 5 ul of each fraction on a 0.8% agarose gel to check the size. Pool 

15 fractions 1-4 (about 10-1.5 kb) and, in a separate tube, pool fractions 5-7 (about 5-0.5 
kb). Add 1 ml ice cold ethanol to precipitate and place on ice for 10 minutes. Pellet 
the precipitate by centrifugation in a microfuge at high speed for 30 minutes. Wash 
the pellets by resuspending them in 1 ml 70% ethanol and repelleting them by 
centrifugation in a microfuge at high speed for 10 minutes and dry. Resuspend each 

20 pellet in 1 0 ul of TE buffer. 

Test Ligation to Lambda Arms. Plate assay by spotting 0.5 ul of the 
sample on agarose containing ethidium bromide along with standards (DNA samples 
of known concentration) to get an approximate concentration. View the samples 
using UV light and estimate concentration compared to the standards. Fraction 1-4 = 

25 >1.0ug/ul. Fraction 5-7 = 500 ng/ul. 

Prepare the following ligation reactions (5 jil reactions) and incubate 4°C, 

overnight: 



30 
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Sample 


H 2 0 


10X Ligase 


lOmM 


Lambda 


Insert 


T4 DNA 






Buffer 


rATP 


arms 


DNA 


Ligase (4 










(ZAP) 




Wu/(1) 


Fraction 1-4 


0.5 ul 


0.5 ul 


0.5 ul 


1.0 ul 


2.0 ul 


0.5 ul 


Fraction 5-7 


0.5 ul 


0.5 ul 


0.5 ul 


1.0 ul 


2.0 ul 


0.5 ul 



Test Package and Plate. Package the ligation reactions following 
manufacturer's protocol. Stop packaging reactions with 500 ul SM buffer and pool 

5 packaging that came from the same ligation. Titer 1 .0 ul of each pooled reaction on 
appropriate host (OD 600 = 1.0) [XLI-Blue MRF]. Add 200 ul host (in mM MgS0 4 ) to 
Falcon 2059 tubes, inoculate with 1 ul packaged phage and incubate at 37°C for 15 
minutes. Add about 3 ml 48°C top agar [50ml stock containing 150 ul IPTG (0.5M) 
and 300 ul X-GAL (350 mg/ml)] and plate on 100 mm plates. Incubate the plates at 

10 37°C, overnight. 

Amplification of Libraries (5.0 x 10 5 recombinants from each library). 
Add 3.0 ml host cells (OD 6 oo = l -0) to two 50 ml conical tube and inoculate with 2.5 X 
10 5 pfu of phage per conical tube. Incubate at 37°C for 20 minutes. Add top agar to 
each tube to a final volume of 45 ml. Plate each tube across five 150 mm plates. 

1 5 Incubate the plates at 37°C for 6-8 hours or until plaques are about pin-head in size. 
Overlay the plates with 8-10 ml SM Buffer and place at 4°C overnight (with gentle 
rocking if possible). 

Harvest Phage. Recover phage suspension by pouring the SM buffer 
off each plate into a 50-ml conical tube. Add 3 ml of chloroform, shake vigorously 

20 and incubate at room temperature for 1 5 minutes. Centrifuge the tubes at 2K rpm for 
10 minutes to remove cell debris. Pour supernatant into a sterile flask, add 500 ul 
chloroform and store at 4°C. 

Titer Amplified Library. Make serial dilutions of the harvested phage 
(for example, 10" 5 = 1 ul amplified phage in 1 ml SM Buffer; 10"*= 1 ul of the 10" 3 
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dilution in 1 ml SM Buffer). Add 200 ul host (in 10 mM MgS0 4 ) to two tubes. 
Inoculate one tube with 10 ul lO' 6 dilution (10" 5 ). Inoculate the other tube with 1 ul 
10" 6 dilution (10" 6 ). Incubate at 37°C for 15 minutes. Add about 3 ml 48°C top agar 
[50ml stock containing 150 ul IPTG (0.5M) and 375 ul X-GAL (350 mg/ml)] to each 
5 tube and plate on 100 mm plates. Incubate the plates at 37°C ? overnight. Excise the 
ZAP II library to create the pBLUESCRIPT library according to manufacturers 
protocols (Stratagene). 

Example 2: Construction of a Stable, Large Insert Picoplankton Genomic DNA 
Library 

10 Cell collection and preparation of DNA. Agarose plugs containing 

concentrated picoplankton cells were prepared from samples collected on an 
oceanographic cruise from Newport, Oregon to Honolulu, Hawaii. Seawater (30 
liters) was collected in Niskin bottles, screened through 10 m Nitex, and concentrated 
by hollow fiber filtration (Amicon DC 10) through 30,000 MW cutoff polyfulfone 

15 filters. The concentrated bacterioplankton cells were collected on a 0.22 m, 47 mm 
Durapore filter, and resuspended in 1 ml of 2X STE buffer (1M NaCl,0.1M EDTA, 
10 mM Tris, pH 8.0) to a final density of approximately 1 x 10 i0 cells per ml. The cell 
suspension was mixed with one volume of 1 % molten Seaplaque LMP agarose 
(FMC) cooled to 40 C, and then immediately drawn into a 1 ml syringe. The syringe 

20 was sealed with parafilm and placed on ice for 10 min. The cell-containing agarose 
plug was extruded into 10 ml of Lyses Buffer (10 mM Tris pH 8.0, 50 mM NaCl, 0.1 
M EDTA, 1% Sarkosyl, 0.2% sodium deoxycholate, 1 mg/ml lysozyme) and 
incubated at 37 C for one hour. The agarose plug was then transferred to 40 mis of 
ESP Buffer (1% Sarkosyl, 1 mg/ml proteinase K, in 0.5M EDTA), and incubated at 

25 55 C for 16 hours. The solution was decanted and replaced with fresh ESP Buffer, and 
incubated at 55 C for an additional hour. The agarose plugs were then placed in 50 
mM EDTA and stored at 4 C shipboard for the duration of the oceanographic cruise. 

One slice of an agarose plug (72 1) prepared from a sample collected 
off the Oregon coast was dialyzed overnight at 4 C against 1 mL of buffer A (100 mM 

30 NaCl, 10 mM Bus Tris Propane-HCl , 100 g/ml acetylated BSA: pH 7.0 @ 25 C) in a 
2 mL microcentrifuge tube. The solution was replaced with 250 1 of fresh buffer A 
containing 10 mM MgCl, and 1 mh4 DTT and incubated on a rocking platform for 1 
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hr at room temperature. The solution was then changed to 250 1 of the same buffer 
containing 4U of Sau3Al (NEB), equilibrated to 37 C in a water bath, and then 
incubated on a rocking platform in a 37 C incubator for 45 min. The plug was 
transferred to a 1 .5 ml microcentrifuge tube and incubated at 68 C for 30 min to 
5 inactivate the enzyme and to melt the agarose. The agarose was digested and the 
DNA dephosphorylased using Gelase and HK-phosphatase (Epicentre), respectively, 
according to the manufacturer's recommendations. Protein was removed by gentle 
phenol/chloroform extraction and the DNA was ethanol precipitated, pelleted, and 
then washed with 70% ethanol. This partially digested DNA was resuspended in 

10 sterile H,0 to a concentration of 2.5ng/l for ligation to the pFOSl vector. 

PCR amplification results from several of the agarose plugs (data not 
shown) indicated the presence of significant amounts of archaeal DNA. Quantitative 
hybridization experiments using rRNA extracted from one sample, collected at 200 m 
of depth off the Oregon Coast, indicated that planktonic archaea in this assemblage 

15 comprised approximately 4.7% of the total picoplankton biomass. This sample 

corresponds to "PAC1"-200 m in Table 1 of DeLong et al. (DeLong, 1994), which is 
incorporated herein by reference. Results from archaeal-biased rDNA PCR 
amplification performed on agarose plug lysates confirmed the presence of relatively 
large amounts of archaeal DNA in this sample. Agarose plugs prepared from this 

20 picoplankton sample were chosen for subsequent fosmid library preparation. Each 1 
ml agarose plug from this site contained approximately 7.5 x 10 5 cells, therefore 
approximately 5.4 x 10 5 cells were present in the 72 1 slice used in the preparation of 
the partially digested DNA. 

Vector arms were prepared from pFOSI as described by Kim et al. 

25 (Kim, 1992). Briefly, the plasmid was completely digested with Astll, 

dephosphorylated with HK phosphatase, and then digested with BamHI to generate 
two arms, each of which contained a cos site in the proper orientation for cloning and 
packaging ligated DNA between 35-45 kbp. The partially digested picoplankton 
DNA was ligated overnight to the PFOS 1 arms in a 15 1 ligation reaction containing 

30 25 ng each of vector and insert and 1U of T4 DNA ligase (Boehringer-Mannheim). 
The ligated DNA in four microliters of this reaction was in vitro packaged using the 
Gigapack XL packaging system (Stratagene), the fosmid particles transfected to E. 
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coli strain DH10B (BRL), and the cells spread onto LB cm i 5 plates. The resultant 
fosmid clones were picked into 96-well microliter dishes containing LB cm i 5 
supplemented with 7% glycerol. Recombinant fosmids, each containing ca. 40 kb of 
picoplankton DNA insert, yielded a library of 3.552 fosmid clones, containing 
approximately 1.4 x 10 8 base pairs of cloned DNA. All of the clones examined 
contained inserts ranging from 38 to 42 kbp. This library was stored frozen at -80 C 
for later analysis. 

Numerous modifications and variations of the present invention are 
possible in light of the above teachings; therefore, within the scope of the claims, the 
invention may be practiced other than as particularly described. 
Example 3: CsCl-Bisbenzimide Gradients 
Gradient visualization by UV: 

Visualize gradient by using the UV handlamp in the dark room and mark bandings of 
the standard which will show the upper and lower limit of GC-contents. 
Harvesting of the gradients: 

1 . Connect Pharmacia-pump LKB PI with fraction collector (BIO-RAD model 

2128). 

2. Set program: rack 3, 5 drops (about 100 ul), all samples. 

3. Use 3 microtiter-dishes (Costar, 96 well cell culture cluster). 

4. Push yellow needle into bottom of the centrifuge tube. 

5. Start program and collect gradient. Don't collect first and last 1-2 ml 
depending on where your markers are. 

Dialysis 

1 . Follow microdialyzer instruction manual and use Spectra/Por CE Membrane 
MWCO 25,000 (wash membrane with ddH20 before usage). 

2. Transfer samples from the microtiter dish into microdialyzer (Spectra/Por, 

3. MicroDialyzer) with multipipette. (Fill dialyzer completely with TE, get rid of 
any air bubble, transfer samples very fast to avoid new air-bubbles). 

4. Dialyze against TE for 1 hr on a plate stirrer. 
DNA estimation with PICOGREEN™ 

1 . Transfer samples (volume after dialysis should be increased 1.5-2 times) with 
multipipette back into microtiter dish. 
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2. Transfer 100 ul of the sample into Polytektronix plates. 

3. Add 100 ul Picogreen-solution (5 ul Picogreen-stock-solution + 995 ul TE 
buffer) to each sample. 

4. Use WPR-plate-reader. 

5 5. Estimate DNA concentration. 

Example 4: Bis-Benzimide Separation of Genomic DNA 

A sample composed of genomic DNA from Clostridium perfringens 
(27% G+C), Escherichia coli (49% WC) and Micrococcus lysodictium (72% G+C) 
was purified on a cesium-chloride gradient. The cesium chloride (Rf = 1.3980) 

10 solution was filtered through a 0.2 m filter and 15 ml were loaded into a 35 ml 
OptiSeal tube (Beckman). The DNA was added and thoroughly mixed. Ten 
micrograms of bis-benzimide (Sigma; Hoechst 33258) were added and mixed 
thoroughly. The tube was then filled with the filtered cesium chloride solution and 
spun in a VTiSO rotor in a Beckman L8-70 Ultracentrifuge at 33,000 rpm for 72 

15 hours. Following centrifugation, a syringe pump and fractionator (Brandel Model 
186) were used to drive the gradient through an ISCO UA-5 UV absorbance detector 
set to 280 nm. Three peaks representing the DNA from the three organisms were 
obtained. PCR amplification of DNA encoding rRNA from a 10-fold dilution of the 
E. coli peak was performed with the following primers to amplify eubacterial 

20 sequences: 

Forward primer: (27F) 

5 -AGAGTTTGATCCTGGCTCAG-3 (SEQIDNO:l) 
Reverse primer: (1492R) 

5 -GGTTACCTTGTTACGACTT-3 (SEQ ID NO:2) 
25 Example 5: FACS/Biopanning 

Infection of library lysates into Exp503 E.coli strain. 25 ml LB + Tet 
culture of Exp503 were cultured overnight at 37 C. The next day the culture was 
centrifuged at 4000 rpm for 10 minutes and the supernatant decanted. 20ml lOmM 
MgS0 4 was added and the OD600 checked. Dilute to OD 1 .0. 
30 In order to obtain a good representation of the library, at least 2-fold 

(and preferably 5-fold) of the library lysate titer was used. For example: Titer of 
library lysate is 2xl0 6 cfu/ml. Need to plate at least 4xl0 6 cfu. Can plate approx. 
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500,000 microcolonies/ 150mm LB-Kan plate. Need 8 plates. Can plate 1 ml of 
reaction/plate- need 8 mis of cells + lysate. 

2-fold (ex. 2 ml) of library lysate was mixed with appropriate amount ( 
e.g., 6 ml) of OD 1.0 Exp503. The sample was incubated at 37°C for at least 1 hour. 
5 Plated 1 ml reaction on 1 50mm LB-Kan plate x 8 plates and incubated overnight at 
30°C. Harvesting, induction, and fixing of library in Exp503 cells. Scrape all cells 
from plates into 20 ml LB using a rubber policeman. Dilute cells approx. 1 : 100 (200 
ul cells/ 20 ml LB) and incubate at 37°C until culture is OD 0.3. Add 1 :50 dilution of 
20% sterile Glucose and incubate at 37°C until culture is OD 1.0. Add 1:100 dilution 

10 of 1M MgS04. Transfer 5 ml of culture to a fresh tube and the remaining culture can 
be used as an uninduced control if desired or discarded. Add MOI 5 of CE6 
bacteriophage to the remaining 5 ml of culture. (CE6 codes for T7 RNA Polymerase) 
(e.g., OD 1 = 8xl0 8 cells/ml x 5 ml = 4xl0 9 cells x MOI 5 = 2xl0 10 bacteriophage 
needed). Incubate culture + CE6 for 2 hr at 37°C. Cool on ice and centrifuge cells at 

15 4000 rpm for 10 min. Wash with 10 ml PBS. Fix cells in 600 ul PBS + 1.8 ml fresh, 
filtered 4% paraformaldehyde. Incubate on ice for 2 hrs. (4% Paraformaldehyde: 
Heat 8.25 ml PBS in flask at 65°C. Add 100 ul 1M NaOH and 0.5 g 
paraformaldehyde (stored at 4°C.) Mix until dissolved. Add 4.15 ml PBS. Cool to 
0°C. Adjust pH to 7.2 with 0.5 M NaH 2 P0 4 . Cool to 0°C. Syringe filter. Use within 

20 24 hrs). After fixing, centrifuge at 4000 rpm for 10 min. Resuspend in 1.8 ml PBS 
and 200 ul 0.1% NP40. Store at 4°C overnight. 

Hybridization of fixed cells. Centrifuge fixed cells at 4000 rpm for 10 
min. Resuspend in 1 ml 40 mM Tris pH7.6/ 0.2% NP40. Transfer 100 ul fixed cells 
to an Eppendorf tube. Centrifuge for 1 min and remove supernatant. Resuspend each 

25 reaction in 50 ul Hybridization buffer (0.9 M NaCl; 20 mM Tris pH7.4; 0.01% SDS; 
25% formamide- can be made in advance and stored at -20°C). Add 0.5 nmol 
fluorescein-labeled primer to the appropriate reactions. Incubate with rocking at 46°C 
for 2 hr. (Hybridization temperature may depend on sequence of primer and 
template.) Add 1 ml wash buffer to each reaction, rinse briefly and centrifuge for 1 

30 min. Discard supernatant. (Wash buffer: 0.9 M NaCl; 20 mM Tris pH 7.4; 0.01% 
SDS). Add another 1 ml of wash buffer to each reaction, and incubate at 48°C with 
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rocking for 30 min. Centrifuge and remove supernatant. Visualize cells under 
microscope using WIB filter. 

FACS sorting. Dilute cells in 1 ml PBS. If cells are clumping, sonicate 
for 20 seconds at 1.5 power. FAC sort the most highly fluorescent single-cells and 
5 collect in 0.5 ml PCR strip tubes (approximately one 96-well plate/ library). PCR 
single-cells with vector specific primers to amplify the insert in each cell. 
Electrophorese all samples on an agarose gel and select samples with single inserts. 
These can be re-amplified with Biotin-labeled primers, hybridized to insert-specific 
primers, and examined in an ELISA assay. Positive clones can then be sequenced. 
10 Alternatively, the selected samples can be re-amplified with various combinations of 
insert-specific primers, or sequenced directly. 
Example 6: Large Insert FACS Biopanning Protocol 

1. Encapsulate 1 vial of 3% home-made SeaPlaque gel. Each vial of gel can 
make 10 6 GMD. Take lOOul melt frozen fosmid pMF21/DH10B library, 

15 OD600 = 0.4 to encapsulate, centrifuge down to lOul. Melt agarose gel, add 

lOOul FBS (fetal bovine serum) and vortex. Place in 50 C water in a beaker. 
Add lOul culture, vortex and add to 17ml mineral oil. Shake for about 30 
times, place on the One Cell machine. Blend at 2600rpm lmin at room 
temperature and 2600rpm 9 minutes on ice. Wash with PBS twice. Resuspend 

20 in 10ml LB+ Apr 50 , shake at 37°C for 4 hours at 230 rpm. Check 

microscopically to see the growth and size of microcolonies. 

2. Centrifuge at 1 500rpm for 6 min. GMDs are resuspend in 5ml of 2xSSC and 
can be saved at 4 °C for several days. Take 200ul GMD in 2xSSC for each 
reaction. 

25 3. Resuspend in 10 ml 2xSSC/5% SDS. Incubate 10 min at RT shaking or 

rotating. Centrifuge. 
4. Resuspend in 5 ml lysis solution containing proteinase K. Incubate 30 min at 

37°C shaking or rotating. Centrifuge. 

Lysis Solution: 
30 50mM Tris pH8 0.75ml 1M Tris 

50mM EDTA 1 .5ml 0.5M EDTA 

1 OOmM NaCl 300 ul SMNaCl 
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1% Sarkosyl 
250ug/ml Proteinase K 



0.75ml 20% Sarkosyl 

375ul proteinase K stock (lOmg/ml) 



11.325ml dH20 



5. Resuspend in 5 ml denaturing solution. Incubate 30 min at RT shaking or 
rotating. Centrifuge at 1 500rpm for 5 min. 

Denaturing Solution: 
0.5MNaOH/1.5MNaCl 

6. Resuspend in 5 ml neutralizing solution. Incubate 30 min at RT shaking or 
rotating. Centrifuge. 

Neutralizing Solution: 
0.5MTrispH8/1.5MNaCl 

7. Wash in 2XSSC briefly. 

8. Aliquot 200ul /RxN into microcentrifuge tubes, microcentrifuge and take out 
the 2XSSC. Add 130 ul "DIG EASY HYB" to prehyb for 45 minutes at 37°C. 
Do prehyb and hyb in Personal Hyb Oven. 

9. Aliquot oligo probe and denature at 85°C for 5 minutes, place on ice 
immediately. Add appropriate amount of probe (0.5-lnmol/RXN) and return 
to rotating hyb. oven for O/N. 

10. Prepare a 1% (lOmg/ml) solution of Blocking Reagent in PBS. Store at 4°C 
for the day use. 

1 1 . Wash GMD's with 0.8ml of 2XSSC/0.1%SDS RT 15 min, rotating. At the 
meantime, prewarm next wash solution. 

12. Wash GMD's with 0.8ml of 0.5XSSC/0.1%SDS 2xl5min at appropriate temp, 
rotating. If more stringency is required, the 2 nd wash can be done in 
0.1XSSC/0.1%SDS. 

13. Wash with 0.8ml/RXN 2XSSC briefly. 

14. Block the reaction w/130ul 1% Blocking Reagent in PBS at RT for 30 
minutes. 

15. Add 1.4ul anti-DIG-POD (so 1:100) and incubate at RT for 3 hours. 

16. Wash GMDs w/ 0.8ml PBS/RN 3x 7 minutes at 37°C. 
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17. Prepare a tyramide working solution by diluting the tyramide stock solution 
1:85 in Amplification buffer/0.0015% H2O2. Apply 130ul tyramide working 
solution at RT and incubate in the dark at RT for 30 minutes. 

18. Wash 3X for 7 min. in 0.8ml PBS buffer @37°C. 
5 19. Visualize by microscope and FACS sort. 

Example 7: Biopanning Protocol 
Preparing Insert DNA from the Lambda DNA 
PCR amplify inserts using vector specific primers CA98 and CA103. 
CA98: ACTTCCGGCTCGTATATTGTGTGG 
10 CA103: ACGACTCACTATAGGGCGAATTGGG 

These primers match perfectly to lambda ZAP Express clones (pBKCMV). 
Reagents : Lambda DNA prepared from the libraries to be panned (Librarians) 
Roche Expand Long Template PCR System #1-759-060 
Pharmacia dNTP mix #27-2094-01 or 
15 Roche PCR Nucleotide Mix (10 mM) #1-581-295 or 

Roche dNTP's - PCR grade #1-969-064 
1 . Make the insert amplification mix: 
X \x\ dH 2 0 (final 50 ^1) 
5 fil lOx Expand Buffer #2 (22.5 mM MgCl 2 ) 
20 0.5 or a 625 fd dNTP mix (20 mM each dNTP) 

10 ng (approx) lambda DNA per library (usually ljil or 1 jxl 1:10 diln) 
1-2 jil CA98 (100 ng/^1 or 15 \M) 
1-2 |il CA103 (100 ng/^1 or 15|iM) 
0.5 \x\ Expand Long polymerase mix 
25 2. PCR amplify: 



Robocycler 



95°C 


3 minute 


x 1 cycle 


95°C 


1 minute 




65°C 


45 seconds 


x 30 cycles 


68°C 


8 minute 




68°C 


8 minute 


x 1 cycle 
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6°C 



3. Analyze 5 pi of reaction product on a gel. 

Note: The reaction product should be a strong smear of products usually ranging from 
0.5-5 kb in size and centered around 1.5-2 kb. 



Prepare Biotinylated Hook 
Reagents : PCR reagents 

Biotin-14-dCTP (BRL #19518-018) 

Individual dNTP stock solutions (Roche dNTP's #1-969-064) 
10 Gene specific template and primers 

PCR purification kit (Roche #1732668 or Qiagen Qiaquick #28106) 

1. Make lOx biotin dNTP mix: 

150plbiotin-14-dCTP 
3 pi lOOmMdATP 
15 3 pi 100 mM dGTP 

3 pi 100 mM dTTP 
1.5 pi 100 mMdCTP 

2. Make PCR mix: 

74 p.1 water 
20 1 0 pi 1 Ox Expand Buffer #1 

10 pi lOx biotin dNTP mix (step #1) 
2 pi Primer #1 (100 ng/pl) 
2 pi Primer #2 (100 ng/pl) 
1 pi template (gene specific) (100 ng/pl) 
25 1 pi Expand Long polymerase mix 



3. PCR amplify: 

Robocycler 



95°C 


3 minute 


x 1 cycle 
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95°C 

* °c 

68°C 


45 seconds 
45 seconds 
** minute 


x 30 cycles 








68°C 


8 minute 


x 1 cycle 


6°C 




00 



* Use an annealing temperature appropriate for your primers. 
** Allow 1 minute/ kb of target length. 

4. Clean up the reaction product using a PCR purification kit. Elute in 50 |il 5T.1E or 
Qiagen's EB buffer (10 mM Tris pH 8.5). 
5 5. Check 5 jal on an agarose gel. 

Note: The product may be slightly larger than expected due to the incorporation of 
biotin. 



Biopanning 

10 Reagents: Streptavidin-conjugated paramagnetic beads (CPG MPG-Streptavidin 
lOmg/ml #MSTR0502)(Dynal Dynabeads M-280 Streptavidin) 
Sonicated, denatured salmon sperm DNA (heated to 95°C, 5 min) 
(Stratagene#201190) 
PCR reagents 

15 dNTPmix 

Magnetic particle separator 

Topo-TA cloning kit with ToplOF' comp cells (Invitrogen #K4550-40) 
High Salt Buffer: 5M NaCl, lOmM EDTA, lOmM Tris pH 7.3 

1 . Make the following reaction mix for each library/ hook combination: 
20 5 |ig insert DNA (PCR amplified lambda DNA) 

100 ng Biotinylated hook (100 ng total if using more than one hook) 
4.5 ^1 20x SSC for a 3x final concentration (or High Salt buffer) 
X fil dH 2 0 for a final volume of 30 ^1 

2. Denature by heating to 95°C for 10 min. (Robocycler works well for this step). 
25 3. Hybridize at 70°C for 90 min. (Robocycler) 

4. Prepare 100 |il of MPG beads for each sample: 
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Wash 100 (il beads two times with 1 ml 3x SSC 
Resuspend in: 50 pi 3x SSC (or High Salt buffer) 

10 (il Sonicated, denatured salmon sperm DNA (10 mg/ml) to 
block (or 100 ng total) 
5 (Do not ice) 

5. Add the hybridized DNA to the washed and blocked beads. 

6. Incubate at room temp for 30 min, agitating gently in the hybridization oven. 

7. Wash twice at room temp with 1 ml O.lx SSC/ 0.1% SDS, (or high salt buffer) 
using magnetic particle separator. 

10 8. Wash twice at 42°C with 1 ml 0. 1 x SSC/ 0. 1 % SDS (or high salt buffer) for 1 0 min 
each, (magnet) 

9. Wash once at room temp with 1 ml 3x SSC. (magnet) 

10. Elute DNA by resuspending the beads in 50 jil dH 2 0 and heating the beads to 
70°C for 30 min or 85°Cfor 10 min. in the hyb oven (or thermomixer at SOOrpm). 

1 5 Separate using magnet, and discard the beads. 

1 1 . PCR amplify 1 - 5 |ul of the panned DNA using the same protocol as Preparing 
Insert DNA from the Lambda DNA above. 

12. Check 5 (il on agarose gel. 



20 



Note: The reaction product should be a strong smear of products usually ranging from 
0.5-5 kb in size and centered around 1.5-2 kb. 



13. Clone 1-4 |al into pCR2.1-TopoTA cloning vector. 

14. Transform 2 x 3 jil into ToplOF' chemically comp cells. Plate each transformation 
on 2 x 150mm LB-kan plates. Incubate at 30°C overnight. 

(Ideal density is ~ 3000 colonies per plate). 
25 Repeat transformation if necessary to get a representative number of colonies per 
library. Archive the Biopanned DNA. 

15. Transfer plates to Hybridization group, along with appropriate templates and a 
single primer for run off PCR 32 P-labeling reactions. 

Analysis of Results 

30 1 . Filter lifts from plates will be performed, and hybridized to the appropriate probe. 
Resultant films will be given to the Biopanned. 

2. Align films to original colony plates. Colonies corresponding to positive "dots-on- 
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film" should be toothpicked, patched onto an LB-Kan plate, and inoculated in 4 ml 
TB-Kan. For automation, inoculate 1 ml TB-kan in a 96-well plate and incubate 18 
hrs. at 37°C. 

3. Overnight cultures are mini-prepped (Biomek if possible). Digest with EcoRI to 
5 determine insert size. 

2 pi DNA 

0.5 m-1 EcoRI 

1 ^1 lOx EcoRI buffer 

6.5 nl dH 2 0 

10 Incubate at 37°C for 1 hr. Check insert size on agarose gel. 

Large insert clones (>500bp) are then PCR confirmed if possible with gene specific 
primers. 

4. Putative positive clones are then sequenced. 

5. Glycerol stocks should be made of all interesting clones (>500bp). 

15 Example 8: HIGH THROUGHPUT CULTIVATION OF MARINE MICROBES 
FROM SEA SAMPLE 

17. Preparation of cell suspension 

Cells were obtained after filtering 1 10 L of surface water through a 0.22 ^m 
membrane. The cell pellet was then resuspended with seawater and a volume of 100 
20 ixL was used for cell encapsulation. This provided cell numbers of approximately 10 
cells per mL. 

18. Cell encapsulation into GMDs 

The following reagents were used: CelMix™ Emulsion Matrix and CelGel™ 
Encapsulation Matrix (One Cell Systems, Inc., Cambridge, MA), Pluronic F-68 

25 solution and Dulbecco's Phosphate Buffered Saline (PBS, without Ca 2+ and Mg 2+ ). 
Scintillation vials each containing 15 ml of CelMix™ emulsion matrix were placed in 
a 40°C water bath and were equilibrated to 40°C for a minimum of 30 minutes. 30 ul 
of Pluronic Solution F-68 (10%) was added to each of 6 vials of melted CelGel™ 
agarose. The agarose mixture was incubated to 40°C for a minimum of 3 minutes. 

30 100 ul of cells (resuspended in PBS) were added per 6 vials of the CelGel™ bottles 
and the resulting mixture was incubated at 40°C for 3 minutes. Using a 1 ml pipette 
and avoiding air bubbles, the CelGel™-cell mixture was added dropwise to the 
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warmed CelMix™ in the scintillation vial. This mixture was then emulsified using 
the CellSyslOO™ MicroDrop maker as follows: 2200 rpm for 1 minute at room 
temperature (RT), then 2200 rpm for 1 minute on ice, then 1 100 rpm for 6 minutes on 
ice, resulting in an encapsulation mixture comprised of microdrops that were 
5 approximately 10-20 microns in diameter. The encapsulation mixture was then 
divided into two 15 ml conical tubes and in each vial, the emulsion was overlayed 
with 5 ml of PBS. The vials tubes were then centrifuged at 1 800 rpm in a bench top 
centrifuge for 10 minutes at RT, resulting in a visible Gel MicroDrop (GMD) pellet. 
The oil phase was then removed with a pipette and disposed of in an oil waste 
10 container. The remaining aqueous supernatant was aspirated and each pellet was 

resuspended in 2 ml of PBS. Each resuspended pellet was then overlayed with 10 ml 
of PBS. The GMD suspension was then centrifuged at 1500 rpm for 5 minutes at RT. 
Overlaying process is repeated and the GMD suspension is centrifuged again to 
remove all free-living bacteria. The supernatant was then removed and the pellet was 
1 5 resuspended in 1 ml of seawater. 10 ul of the GMD suspension was then examined 
under the microscope in order to check for uniform GMD size and containment of 
then encapsulated organism into the GMD. This protocol resulted in 1 to 4 cells 
encapsulated in each GMD. 

1 9. Sorting of GMDs containing single cells for identification by 1 6S rRNA gene 
20 sequence 

On the first day of cultivation we sorted occupied GMDs that contained one 
to 4 cells, although most had only single cells. The sorting was done in a Mo-Flo 
instrument (Cytomation) by staining the cells inside the GMDs with Syto9 and then 
selecting green fluorescence (from the stain) and side-scatter as parameters for sorting 
25 gates. The staining was necessary since the cells are much smaller than E.coli and 

therefore show very low light-scatter signals. The target GMDs were sorted into a 96- 
well plate containing a PGR mixture and ready to be amplified immediately after 
sorting. We used a Hotstart enzyme (Qiagen) such as no reaction would occur before 
boiling for 15 min and therefore allows to work at room temperature before 
30 amplification. Before starting the PCR it was necessary to radiate the PCR mixture 
with a Stratalinker (Stratagene) at full power for 14 min to cross-link any potential 
genomic DNA present in the mixture before sorting. The primers used include the pair 
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27F and 1392R and 27F and 1522R according to the positions inE.coli gene 
sequence. The primers were obtained from IDT-DNA Technologies and were purified 
by HPLC. The primer concentration used in the reactions was 0.2 jiM. We used a 
"touchdown" program consisting of 3 stages: a) boiling 15 min, b) 15 cycles 
5 decreasing the annealing temperature from 62 to 55°C by 0.5 degrees per cycle, c) a 
series of cycles (20-40) increasing the annealing time 1 sec per cycle starting with 30 
sec but keeping the temperature constant at 55°C. All the other stages of the PCR 
were as recommended by manufacturer. This protocol allowed the amplification of 
the 16S rRNA gene from individual cells encapsulated or small consortia of cells. The 
10 PCR products were then cloned into TOPO-TA (Invitrogen) cloning vectors and 
sequenced by dye-termination cycle sequencing (Perkin-Elmer ABI). 
Cell growth of encapsulated cells inside GMDs 

The encapsulated GMDs were placed into chromatography columns that 
allowed the flow of culture media providing nutrients for growth and also washed out 
1 5 waste products from cells. The experiment consisted of 4 treatments including the use 
of seawater, and amendments (inorganic nutrients including trace metals and 
vitamins, amino acids including trace metals and vitamins, and diluted rich organic 
marine media). This different set of nutrients provided a gradient to bias different 
microbial populations. The seawater used as base for the media was filter sterilized 
20 through a 1000 kDa and a 0.22 |um filter membranes prior to amendment and 

introduction to the columns. The cells were then incubated for a period of 17 weeks 
and cell growth was monitored by phase contrast microscopy. Cell identification was 
done by 16S rRNA gene sequence of grown colonies. 

20. Sorting of GMDs containing colonies consisting of one or more cell types 
25 To identify the diversity and the community composition of the different 

treatments we performed a "bulk sorting" of the GMDs. This was done by taking a 
subsample of the GMDs from each column and run them into the Flow-cytometer. We 
selected as gating criteria forward- and side-scatter as occupied GMDs with a colony 
of 10 or more cells of individual cell sizes ranging from 0.5 to 5 jam were easy to 
30 discriminate from empty GMDs. We verified each time by phase contrast microscopy 
that we selected the correct gate for sorting. We then sorted a total of 300 GMDs per 
each individual PCR reaction (prepared as above) and ran the reaction in a 
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thermocycler for a total of 50 to 60 cycles to have enough PCR product to be 
visualized by gel electrophoresis. The resulting PCR reactions from the same column 
were combined (2 to 4 replicates), cloned and sequenced as above to assess the 
phylogenetic diversity from each column and observe the bias effect resulting from 
5 the use of different nutrient regimes. 

Gene sequencing and phylogenetic analyses 

The gene sequences were aligned and compared to our 16S rRNA database 
with the ARB phylogenetic program. Maximum Parsimony and neighbor joining trees 
were constructed using the amplified gene sequences (approximately 1400 bp). 
10 Example 9: Microextraction Procedure 

A single copy of Streptomyces containing clones from a mixed population are FACS- 
sorted onto agar, allowed to develop into individual colonies, and bioassayed as 
individual clones. 

CONSTRUCTION OF A CLONE EXPRESSING A BIO ACTIVE METABOLITE 
15 A genomic library of Streptomyces murayamaensis is constructed in pJ0436 
(Bierman et al., Gene 1991 1 16:43-49) vector and hybridized with probes for 
polyketide synthase. A clone (IB) which hybridized was chosen and shuttled into 
Streptomyces venezuelae ATCC 10712 strain. The vector pMF17 was also introduced 
into S. diversa as a negative control. When bioassayed on solid media, clone IB 
20 expressed strong bioactivity towards Micrococcus luteus demonstrating that the insert 
present in clone IB encoded a bioactive polyketide molecule. 
FACS-sorting of S. venezuelae clones 

The iS. venezuelae exconjugant spores containing clone IB, as well as pJ0436 vector, 
are FACS-sorted in 48-weII, 96-weIl, and 384-well format into corresponding plates 
25 containing MYM agar + Apramycin 50ug/ml. The single spore clones were allowed 
to germinate, grow and sporulate for 4-5 days. 

Natural product extraction procedure : After the clones were fully grown and 
sporulated for 4-5 days, following volumes of solvent methanol were added to the 
each well containing the clones. 
30 48 well format: 0.8 ml 
96 well format: 0.100 ml 
384 well format : 0.06 ml 
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The plates were incubated at room temperature overnight. 

The next day, the following volumes were recovered from the wells containing the 
clones. 

48 well format : 0.3 ml 
96 well format : 0.060 ml 
384 well format: 0.030 ml 

The extracts were assayed from a single well, and after combining extracts from 2, 4 
and 10 wells. The methanol extract was dried and resuspended in 40 ul of 
methanol: water and 20 ul of which was assayed against M luteus as the indicator 
strain. 

A single colony of S. venezuelae c ontaining clone IB produced enough bioactive 
molecule, in 48-well, 96-well as well as 384-well format, to be extracted by the 
microextraction procedure and to be detected by bioassay. 
Example 11: Expression of actinorhodin pathway in S. venezuelae 10712 
When Sau3A pIJ2303 library constructed in pJ0436 was introduced into S. 
venezuelae, one exconjugant which appeared blue-grey in color was spotted. This 
exconjugant showed blue pigment on R2-S agar demonstrating the successful 
expression of a heterologous pathway (actinorhodin) pathway in S. venezuelae. J0436 
Seeregational stability of S. venezuelae 10712 (pJQ436::actinorhodin) 

Since Streptomyces clones for small molecule production are grown in 
absence of antibiotic selection, it was important to determine how stable the S. 
venezuelae pJ0436 recombinant clones are. The S. venezuelae 10712 
(pJ0436::actinorhodin) clone was used as an example. 

The act clone was grown in R2-S liquid cultures with and without 
apramycin and total cell count was done by plating on R2-S agar with and without 
apramycin. The act clone gave 100% and 96% apramycin resistant colonies when 
grown with and without apramycin, respectively. This demonstrates that S. 
venezuelae pJ0436 clones are quite stable segregationally. 
Expression stability of S. venezuelae 10712 (pJ0436::actinorhodin) 

Expression of the actinorhodin gene cluster in S. venezuelae 10712 has 
been demonstrated. However, when this clone was grown in liquid cultures it failed to 
produce actinorhodin, as determined by the absence of its blue color. Nonetheless, 
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when mycelia from such cultures were plated on solid media, actinorhodin producing 
colonies were clearly evident. The majority of the colonies produced a faint blue color 
while a few colonies produced abundant actinorhodin. These colonies which produce 
actinorhodin abundantly have been named as HBC (hyper blue clones) clones. 

5 These observations demonstrate that perhaps in HBC clones, a host 

mutation has occurred which allows very efficient actinorhodin expression. Mutations 
which could lead to efficient actinorhodin expression could include a variety of 
targets such as, elimination of negative regulators like cutRS, overexpression of 
positive regulators, or efficient expression of pathways which provide precursors for 

10 actinorhodin. The hyper production of actinorhodin by the HBC clones thus strongly 
demonstrates that it is indeed possible for us to construct a strain which is more 
optimized for heterologous expression of small molecules, by random mutagenesis or 
by specific cutRS knockout mutagenesis. 
Construction of a jadomycin blocked mutant of S. venezuelae 

15 Orfl of the jadomycin biosynthetic gene cluster was chosen as a target. 

Primers were designed so as to amplify jad-L and jad-R fragments with proper 
restriction sites for future subcloning. S. venezuelae is reasonably sensitive to 
hygromycin and therefore, hygromycin resistance gene will be used to disrupt the orf- 
1 gene. The strategy used for disrupting the jadomycin orf-1 is described in the 

20 attached figure. The hyg-disrupted copy of the orf-1 gene will then be placed on 
pKC1218 and used for gene replacement in the S. venezuelae 10712, as well as 
VS153 chromosome. 

Expression of the yellow clone in S. venezuelae 

The single arm rescue technique to recover the yellow clone insert 

25 from S. lividans clone 525Sm575 was described. The recovered clone #3 was mated 
into S. venezuelae 10712 as well as VS153. Yellow color was evident after several 
days on both 10712 as well as VS153 plates but absent in the pJ0436 vector alone 
controls. Three 10712 yellow clones were grown in liquid R2-S medium and all three 
produced yellow color profusely. This experiment has validated S. venezuelae as a 

30 host and pJ0436 as the vector for heterologous expression for the second time, the 

first time being with the actinorhodin gene cluster. This yellow clone insert could now 
be used in validation of different strains in our strain improvement program. 
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3. Development of a mating protocol in a microtiter plate format. 

In order to have the individual E. coli donor clones archived, we are 
attempting to develop a mating protocol in a microtiter plate format. According to this 
protocol, we plan to sort the E. coli library into a 96-well microtiter plate. The 

5 matings with S. diversa would then be done in on a R2-S agar plate in an array format 
corresponding to the 96-well microtiter plate containing the E. coli clones. The 
bioassays can be either conducted on the mating R2-S plate or the clones can be first 
replica plated on to another suitable agar plate and then bioassayed. This approach 
will allow us to go back to the E. coli clones once we detect a bioactive clone among 

10 the S. diversa exconjugant library. The E. coli clone can then be mated back into S. 
diversa for re-transformation and confirmation of the bioactivity. 

In a preliminary experiment, matings were done by spotting S. diversa 
spores together with E. coli donor cells on R2-S agar plate (rather than spreading). 
After about 8 hours the plate was overlayed as usual with apramycin and nalidixic 

15 acid. The exconjugants appeared only on those spots were E. coli donor was added, 
but not on those spots containing S. diversa spores alone. These initial data are very 
promising, although some more standardization needs to be done to develop this 
technique fully. 

Example 12: Production of single cells or fragmented mvcelia 
20 In order to produce single cells or fragmented mycelia, 25ml MYM 

media was inoculated (see recipe below) in 250 ml baffled flask with 100 ul of 

Streptomyces 10712 spore suspension and incubated overnight at 30°C 250rpm. 

After a 24 hour incubation, 10 ml was transferred to 50ml conical polypropylene 

centrifuge tube and centrifuged at 4,000rpm for 10 minutes @ 25°C. Supernatant was 
25 decanted and the pellet was resuspended in 10ml 0.05M TES buffer. The cells were 

sorted into MYM agar plates (sort 1 cell per drop, 5 cells per drop, 10 cells per drop) 

and we incubated the plates at 30°C. 

MYM media (Stuttard, 1982, J. Gen .Microbiol. 128:1 15-121) 

contains: 4 g maltose, 10 g malt ext., 4 g yeast extract, 20 g agar, pH 7.3, water to 1 L. 
30 Example 13: An exemplary method for the discovery of novel enzymes 

The following describes a method for the discovery of novel enzymes 

requiring large substrates (e.g., cellulases, amylases, xylanases) using the ultra high 
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throughput capacity of the flow cytometer. As these substrates are too large to get 
into a bacterial cell, a strategy other than single intracellular detection must be 
employed in order to use the flow cytometer. For this purpose, we have adapted the 
gel microdrop (GMD) technology (One Cell Systems, Inc.) Specifically, the enzyme 
substrate is captured within the GMD and the enzyme allowed to hydrolyze the 
substrate within this microenvironment. However, this method is not limited to any 
particular gel microdrop technology. Any microdrop-forming material that can be 
derivatized with a capture molecule can be used. The basic experimental design is as 
follows: Encapsulate individual bacteria containing DNA libraries within the GMDs 
and allow the bacteria to grow to a colony size containing hundreds to thousands of 
cells each. The GMDs are made with agarose derivatized with biotin, which is 
commercially available (One Cell Systems). After appropriate colony growth, 
streptavidin is added to serve as a bridge between a biotinylated substrate and the 
biotin-labeled agarose. Finally, the biotinylated substrate will be added to the GMD 
and captured within the GMD through the biotin-streptavidin-biotin bridge. The 
bacterial cells will be lysed and the enzyme released from the cells. The enzyme will 
catalyze the hydrolysis of the substrate, thereby increasing the fluorescence of the 
substrate within the GMD. The fluorescent substrate will be retained within GMD 
through the biotin-streptavidin-biotin bridge and thus, will allow isolation of the 
GMD based on fluorescence using the flow cytometer. The entire microdrop will be 
sorted and the DNA from the bacterial colony recovered using PCR techniques. This 
technique can be applied to the discovery of any enzyme that hydrolyzes a substrate 
with the result of an increased fluorescence. Examples include but are not limited to 
glycosidases, proteases, lipases, ferullic acid esterases, secondary amidases, and the 
like. 

One system uses a biotin capture system to retain secreted antibodies 
within the GMD. The system is designed to isolate hybridomas that secrete high 
levels of a desired antibody. This basic design is to form a biotin-streptavidin-biotin 
sandwich using the biotinylated agarose, streptavidin, and a biotinylated capture 
antibody that recognizes the secreted antibody. The "captured" antibody is detected 
by a fluoresceinated reporter antibody. The flow cytometer is then used to isolate the 
microdrop based on increased fluorescence intensity. The potentially unique aspect to 
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the method described here is the use of large fluorogenic substrates for the 
determination of enzyme activity within the GMD. Additionally, this example uses 
bacterial cells containing DNA libraries instead of eukaryotic cells and is not confined 
to secreted proteins as the bacterial cells will be lysed to allow access to the enzymes. 

The fluorogenic substrates can be easily tailored to the particular 
enzyme of interest. Described below is a specific example of the chemical synthesis 
of an esterase substrate. Additionally, two examples are given which describe the 
different possible chemical combinations that can be used to make a wide variety of 
substrates. 

Example of Reaction Sequence Leading to GMD- Attachable Substrate 





o o 

C 6 Hi3 A 0 A C 6 H 13 



O^CeH 13 




In the first step, 1-amino-l l-azido-3,6,9-trioxaundecane [Reference 3], an asymmetric 
spacer, is attached to N-hydroxysuccinamide ester of 5 -carboxy fluorescein 
(Molecular Probes). After reduction of the azide functional group on the end of the 
attached spacer (step 2), activated biotin (Molecular Probes) is attached to the amine 
terminus (step 3), and the sequence is completed by esterification of phenolic groups 



121 



09010-400001 (DIVER 1280-36) 

of the fluorescein moiety (step 4). The resulting compound can be used as a substrate 

in screens for esterase activity. 

Design of GMD- Attachable Fluorogenic Substrates 




5 Fluor - core fluorophore structure, capable of forming fluorogenic derivatives, e.g. 
coumarins, resorufins, xanthenes, and others. 

Spacer - a chemically inert moiety providing connection between biotin moiety and 
the fluorophore. Examples include alkanes and oligoethyleneglycols. The choice of 
the type and length of the spacer will affect synthetic routes to the desired products, 
10 physical properties of the products (such as solubility in various solvents), and the 
ability of biotin to bind to deep pockets in avidin. 

CI, C2, C3, C4 - connector units, providing covalent links between the core 
fluorophore structure and other moieties. CI and C2 affect the specificity of the 
substrates towards different enzymes. C3 and C4 determine stability of the desired 
15 product and synthetic routes to it. Examples include ether, amine, amide, ester, urea, 
thiourea, and other moieties. 

Rl and R2 - functional groups, attachment of which provides for quenching of 
fluorescence of the fluorophore. These groups determine the specificity of substrates 
towards different enzymes. Examples include straight and branched alkanes, mono- 
20 and oligosaccharides, unsaturated hydrocarbons and aromatic groups. 

a. Design of GMD-Attachable Fluorescence Resonance Energy Transfer 
Substrates 
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Fluor - A fluorophore. Examples include acridines, coumarins, fluorescein, 
5 rhodamine, BODIPY, resorufin, porphyrins, etc. 

Quencher - A moiety, which is capable of quenching fluorescence of the fluorophore 
when located at a close enough distance. Quencher can be the same moiety as the 
fluorophore or a different one. 

Polymer is a moiety, consisting of several blocks, a bond between which can be 
10 cleaved by an enzyme. Examples include amines, ethers, esters, amides, peptides, and 
oligosaccharides, 

CI and C2 are equivalent to C3 and C4 in the previous design. 
Spacer is equivalent to Spacer in the previous design. 
References: 

15 [1] Gray, F, Kenney, J.S., Dunne, J.F. Secretion capture and report web: use of 
affinity derivatized agarose microdroplets for the selection of hybridoma cells. J 
Immunol. Meth. 1995, 182, 155-163. 

[2] Powell, K.T. and Weaver, J.C. Gel microdroplets and flow cytometry: Rapid 
determination of antibody secretion by individual cells within a cell population. 
20 Bio/technology 1990, 8, 333-337. 

[3] Schwabacher, A. W.; Lane, J. W.; Schiesher, M. W.; Leigh, K. M.; Johnson, C. 
W. J. Org. Chem. 1998, 63, 1727 - 1729. 
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Example 14: An exemplary ultra high throughput screen: a recombinant approach 
This example demonstrates an ultra high throughput screen for the 
discovery of novel anticancer agents. This method uses a recombinant approach to 
5 the discovery of bioactive molecules. The examples use complex DNA libraries from 
a mixed population of uncultured microorganisms that provide a vast source of natural 
products through recombinant expression from whole gene pathways. The two 
objectives of this Example include: 

1) Engineering of mammalian cell lines as reporter cells for cancer targets to be 
1 0 used in ultra-high throughput assay system. 

2) Detection of novel anticancer agents using an ultra high throughput FACS- 
based screening format. 

The present invention provides a new paradigm for screening technologies that brings 
the small molecule libraries and target together in a three dimensional ultra high 
15 throughput screen using the flow cytometer. In this format, it is possible to achieve 
screening rates of up to 10 8 per day. The feasibility of this system is tested using 
assays focused on the discovery of novel anti-cancer agents in the areas of signal 
transduction and apoptosis. Development of a validated assay should have a profound 
impact on the rate of discovery of novel lead compounds. 

20 Experimental Design and Methods 
1 . Development of cell lines 

The goal of this example is to develop an ultra high throughput 
screening format that can be used to discover novel chemotherapeutic agents active 
against a range of molecular targets known to be important in cancers. The feasibility 

25 of this approach will be tested using mammalian cell lines that respond to activation 
of the epidermal growth factor receptor (EGFR) with induction of expression of a 
reporter protein. The EGFR-responsive cells will be brought together with our 
microbial expression host within a microdrop (see Example 13 and co-pending U.S. 
patent 6,280,926, and U.S. application Serial No. 09/894,956, both herein 

30 incorporated by reference). These expression hosts will be Streptomyces or E coli 
and will contain libraries derived from a mixed population of organisms, i.e. high 
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molecular weight environmental DNA (10-100kb fragments) cloned into the 
appropriate vectors and transferred to the host. These large DNA fragments will 
contain biosynthetic operons which consist of the genes necessary to produce a 
bioactive small molecule. A bioactive molecule from the microbial host will elicit a 

5 biological response in the mammalian cell which will induce expression of a 

fluorescent reporter. The entire microdrop will be individually sorted on the flow 
cytometer based on fluorescence and the DNA from the host recovered. The mixed 
population libraries may contain from 10 4 -10 10 clones, including 10 5 , 10 6 , 10 7 , 10 8 , 
10 9 , or any multiple thereof. 

1 0 An assay based on the EGF receptor was chosen because of its possible 

role in the pathogenesis of several human cancers. The EGF-mediated signal 
transduction pathway is very well characterized and several inhibitors of the EGF 
receptor have been found from natural sources (21,22). The EGFR is one of the early 
oncogenes discovered (erbB) from the avian erythroblastosis retrovirus and due to a 

1 5 deletion of nearly all of the extracellular domain, is constitutively active (23). Similar 
types of mutations have been found in 20-30% of cases of glioblastoma multiforme, a 
major human brain tumor (24). Overexpression of EGFR correlates with a poor 
prognosis in bladder cancer (25), breast cancer (26,27), and glioblastoma multiforme 
(28). Most of these cancers occur in an EGF-secreting background and demonstrates 

20 an autocrine growth mechanism in these cancers. Additionally, EGFR is over- 
expressed in 40-80% of non-small cell lung cancers and EGF is overexpressed in half 
of primary lung cancers, with patient prognosis significantly reduced in cases with 
concurrent expression of EGFR and EGF (29,30). For these reasons, inhibitors of the 
EGF receptor are potentially useful as chemotherapeutic agents for the treatment of 

25 these cancers. 

The goal of this experiment is to create mammalian cell lines that serve 
as reporter cells for anticancer agents. HeLa cells endogenously express the EGFR as 
confirmed by FACS analysis using the anti-EGFR antibody, Ab-1 (Calbiochem). In 
contrast, CHO cells have little or no expression of the EGFR. The gene encoding 
30 EGFR was obtained from Dr. Gordon Gill (University of California, San Diego) and 
cloned it into the pcDNA3/hygro vector. The resulting vector was transfected into 
CHO cells and stable transformants selected with hygromycin. Enrichment of high 
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EGFR-expressing CHO cells was performed through two rounds of F ACS sorting 
using the anti-EGFR antibody. For detection of the activated pathway, a parallel 
approach is being taken utilizing both the PathDetect system from Stratagene (San 
Diego, CA) and the Mercury Profiling system from Clontech (San Diego, CA). The 
5 Path Detect system has been validated by researchers as a means of detecting 
mitogenic stimuli (31,32). 

The EGFR is a tyrosine kinase receptor that functions through the 
MAP-kinase pathway to activate the transcription factor Elk-1 (33). The PathDetect 
product includes a fusion trans-activator plasmid (pFA-Elkl) that encodes for 
10 expression of a fusion protein containing the activation domain of the Elk-1 

transcription activator and the DNA binding domain of the yeast GAL4. A second 
plasmid contains a synthetic promoter with five tandem repeats of the yeast GAL4 
binding sites that control expression of the Photinus pyralis luciferase gene. The 
luciferase gene was removed and replaced with the gene encoding for the destabilized 
1 5 version of the enhanced green fluorescent protein (EGFP) (plasmid designated pFR- 
d2EGFP). The two plasmids were transfected together into the EGFR/CHO and 
HeLa cells at a ratio of 10:1 (pFR-EGFP: pFA-Elkl) and stable transformants selected 
using the neomycin resistance gene located on the pFA-Elkl plasmid. Thus, ligand 
binding to the EGFR will initiate a signal transduction cascade that results in 
20 activation of the Elkl portion of the fusion protein, allowing the DNA binding domain 
of the yeast GAL4 to bind to its promoter and turn on expression of EGFP. 

Stimulation in the presence of serum is not surprising as this signal 
transduction pathway is common to most growth factors and it is likely that many 
growth factors including EGF are present in the serum. After 24 hours of significant 
25 serum starvation, this response is greatly reduced (Figure 2A). The next step will be 
to selectively stimulate these cells with recombinant EGF (Calbiochem) and isolate 
the highly responsive single clones using the flow cytometer. These clones will be 
selected by sorting simultaneously for high levels of GFP and the EGFR. The EGFR 
will be detected using an anti-EGFR antibody with a secondary antibody labeled with 
30 phycoerythrin. This system has the advantage that use of the yeast GAL4 promoter in 
these cells should keep background or spurious induction of EGFP to a minimum. 

The second group of cell lines uses the Mercury Profiling system to 
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assay the same EGFR pathway. This system responds to activation of the pathway 
with an increase in the expression of human placental secreted alkaline phosphatase 
(SEAP). A fluorescent signal will be obtained by the addition of the phosphatase 
substrate ELF-97-phosphate (Molecular Probes), which yields a bright fluorescent 

5 precipitate upon cleavage. The advantage of this approach over the PathDetect 
system is the ability to amplify the signal through enzyme catalysis for low-level 
activation of the pathway. This parallel approach will increase the probability of 
success in finding bioactive compounds. In the Mercury Profiling system, a vector 
containing the cis-acting enhancer element SRE and the TATA box from the 

10 thymidine kinase promoter is used to drive expression of alkaline phosphatase (pTA- 
SEAP). This system relies on the endogenous transactivators present in the cell, such 
as Elk-1 , to bind the SRE element on the vector and drive expression of SEAP upon 
stimulation of EGFR. The pTA-SEAP vector was transfected into the EGFR/CHO 
and HeLa cells and stable transformants selected using neomycin. Again, stimulation 

15 of the pathway occurred in the presence of serum factors in the media. Upon serum 
starvation, this response was greatly reduced (Figure 2B). Single high expressing 
clones will be isolated following stimulation with EGF and sorting using a flow 
cytometer. 

Development of ultra high throughput FACS assay 

20 A complex mixed population libraries (> 1 0 6 primary clones/library) 

was generated that provided access to the untapped biodiversity that exist in the >99% 
uncultivable microorganisms. These novel libraries require the development of ultra 
high throughput screening methods to obtain complete coverage of the library. We 
propose developing an assay using the flow cytometer that allows detection of up to 

25 10 8 clones/day. 

In this assay format (Figure 1), an expression host (Streptomyces, E. 
coli) and a mammalian reporter cell will be co-encapsulated together within a 
microdrop. The microdrop holds the cells in close proximity to each other and 
provide a microenvironment that facilitates the exchange of biomolecules between the 

30 two cell types. The reporter cell will have a fluorescent readout and the entire 
microdrop will be run through the flow cytometer for clonal isolation. The DNA 
from the genes or pathway of interest will subsequently be recovered using in vitro 
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molecular techniques. This assay format will be validated for the discovery of both 
EGFR inhibitors as well as for small molecules that induce apoptosis. With validation 
of this format, we will progress to the ultra high throughput screening phase designed 
to discover novel chemotherapeutic agents active against these important molecular 

5 mechanisms underlying tumorigenesis. 

The feasibility of this approach will be analyzed initially using the 
engineered cell lines described above that respond to activation by EGF with 
increased expression of a reporter protein (i.e. EGFP or alkaline phosphatase). 
Additionally, this initial study will use an E. coli host that over-expresses human EGF 

10 as a secreted protein directed to the bacterial periplasm (34). This approach will 
allow us to validate the assay format prior to screening for inhibitors of the EGFR 
pathway using our E. coli and Streptomyces expression libraries. For this experiment, 
the engineered cell lines will be co-encapsulated together with the E. coli host at a 
ratio of one to one. The EGF-expressing bacteria will be allowed to grow and form a 

1 5 colony within the microdrop. Due to the vastly higher growth rate of bacteria, a 
colony of bacteria will form prior to any or minimal cell division of the eukaryotic 
cell. This colony will then provide a significantly increased concentration of the 
bioactive molecule. The bacterial colony will be selectively lysed using the antibiotic 
polymyxin at a concentration that allows cell survival (35). This antibiotic acts to 

20 perforate bacterial cell walls and should result in the release of EGF from these cells 
without affecting the eukaryotic cell. In the final discovery assays, this lysis 
treatment should not be necessary as the small molecule products will likely be able to 
freely diffuse out of the cell. The EGF will activate the signal transduction pathway 
in the eukaryotic cell and turn on expression of the reporter protein. 

25 The microdrops will be run through the flow cytometer and those 

microdrops exhibiting an increased fluorescence will be sorted. The DNA from the 
sorted microdrops will be recovered using PCR amplification of the insert encoding 
for EGF. For the reporter cells expressing secreted alkaline phosphatase, a couple of 
additional steps are required to achieve a fluorescent readout. As the enzyme is 

30 secreted from the cell, it is possible to prevent the diffusion of the protein from the 
microdrop by selectively capturing it within the matrix of the microdrop. This can be 
accomplished by using microdrops made with agarose derivatized with biotin. By 
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forming a sandwich with streptavidin and a biotinylated anti-alkaline phosphatase 
antibody, it is possible to capture alkaline phosphatase where it can catalyze the 
conversion of the ELF-97 phosphate substrate within the microdrop (Figure 3 A). 
This technique was successfully developed by One Cell Systems for the isolation of 
5 high expressing hybridomas (36, 37). In our hands, with the encapsulation of the 
SEAP expressing cells, we have shown that upon addition of the Elf-97 phosphatase 
substrate, a fluorescent precipitate forms within the microdrop (Figure 3B&C). 

Initial experiments demonstrate the feasibility of co-encapsulating E. 
coli and mammalian cells (e.g., CHO) within microdrops. Microdrops were formed 
10 using 3% agarose dropped in oil and blended at 2600 rpm. The E. coli and CHO cells 
were encapsulated at a ratio of 1 : 1 (Figure 4A). After 6 hours, the single bacterial cell 
grew into a colony containing thousands of cells (Figure 4B). The cells within the 
microdrops were stained with propidium iodide to determine viability and 
approximately 70-85 % of the CHO cells remained viable after 24 hours. Subsequent 
1 5 steps include determining the response of encapsulated clonal EGF-responsive 

mammalian cells to varying concentrations of EGF in the presence and absence of 
EGFR inhibitors such as Tyrphostin A46 or Tyrphostin A48 (Calbiochem). In 
addition, E. coli clones producing high levels of secreted EGF will be isolated using 
the Quantikine human EGF immunoassay (R&D Systems). Finally, these two cell 
20 types will be brought together within the microdrop and a change in fluorescence of 
the eukaryotic cell will be analyzed on the flow cytometer in the presence and absence 
of the EGFR inhibitors. A positive result in this experiment would be an increase in 
fluorescence that can be blocked by the EGFR inhibitors. 

The next step will be to mix the EGF-expressing E. coli with non- 
25 expressing cells at varying ratios from 1 : 1 ,000 to 1 : 1 ,000,000 to mimic the conditions 
of an mixed population library discovery screen. The bacterial mixtures and the 
mammalian cells will be co-encapsulated as described above. The highly fluorescent 
microdrops will be individually sorted by the flow cytometer. To confirm a positive 
hit, the DNA will be recovered by PCR amplification using primers directed against 
30 the EGF gene. To improve the signal to noise ratio, it is likely that it will be 

necessary to undergo several rounds of enrichment before isolation of positive EGF- 
expressing clones, especially for the higher mixture ratios. 
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In this case, the microdrops will first be sorted in bulk, the microdrop 
material removed with GELase (Epicentre Technologies) and the bacteria allowed to 
grow. The encapsulation protocol will be repeated with fresh eukaryotic cells until a 
highly enriched population is observed. At this point, single microdrops will be 
isolated and recovery of the EGF-expressing clone confirmed by PCR. With 
validation of this assay, the goal will be to screen for inhibitors of the EGFR using our 
mixed population libraries expressed in optimized E. coli and Streptomyces hosts. 
This assay will be done in the presence of EGF and the assay endpoint will be a 
decrease in fluorescence. This format is not limited to only EGFR inhibitors as any 
protein within this pathway could be inhibited and would appear positive in this 
screen. Likewise, this screen can also be adapted to the multitude of anti-cancer 
targets that are known to regulate gene expression. In fact, using this present system, 
with the addition of the appropriate receptors, it would be possible to screen for 
inhibitors of other growth factors such as PDGF and VEGF. 

If an increase in fluorescence is not observed with co-encapsulation of 
the EGF-expressing cells and the mammalian reporter cell, there could be several 
reasons. First, it is possible that the EGF diffuses out of the cell too quickly to elicit a 
response. In this case, it will be necessary to modify the microdrops to limit diffusion 
and concentrate the bioactive molecule at the site of the reporter cell. It is also 
possible that in the specific case of the EGF assay, the cells will not continue to 
produce EGF after polymyxin treatment and thus, the incubation time of the reporter 
cells with EGF will be minimal. This is unlikely as the polymyxin treatment used will 
be at concentrations well below that which produces decreased cell viability. 
However, if EGF is not continually expressed in this system, other permeabilization 
methods will be explored that do not significantly affect cell metabolism, such as the 
bacteriocin release protein (BRP) system (Display Systems Biotech). The BRP opens 
the inner and outer membranes of E. coli in a controlled manner enabling protein 
release into the culture medium. This system can be used for large-scale protein 
production in a continuous culture and thus should be compatible with cell survival. 

Apoptosis, or programmed cell death, is the process by which the cell 
undergoes genetically determined death in a predictable and reproducible sequence. 
This process is associated with distinct morphological and biochemical changes that 
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distinguish apoptosis from necrosis. The malfunctioning of this essential process can 
often lead to cancer by allowing cells to proliferate when they should either self- 
destruct or stop dividing. Thus, the mechanisms underlying apoptosis are currently 
under intense scrutiny from the research community and the search for agents that 
5 induce apoptosis is a very active area of discovery. 

The present invention provides an assay for the discovery of apoptotic 
molecules using our ultra high throughput encapsulation technology. The source of 
these small molecules will come from our extremely complex mixed population 
libraries expressed in Streptomyces and E. coli host strains. These host strains will be 

10 co-encapsulated together with a eukaryotic reporter cell, the small molecule will be 
produced in the bacterial strain, and will act on the mammalian reporter cell which 
will respond by induction of apoptosis. Apoptosis will be detected using a fluorescent 
marker, the entire microdrop sorted using the flow cytometer, and the DNA of interest 
recovered. The feasibility of this assay will be determined using our optimized 

15 Streptomyces host strain, S. diversa, co-encapsulated with the apoptotic reporter cell 
derived from human T cell leukemia (e.g., Jurkat cells). The pathway controlling 
production of the anti-tumor antibiotic, bleomycin, will be cloned into S. diversa as 
the source of an apoptosis-inducing agent. The readout for induction of apoptosis in 
Jurkat cells will be obtained using the fluorescent marker, Alexis 488-annexin V™. 

20 The bleomycin group of compounds are anti-tumor antibiotics that are 

currently being used clinically in the treatment of several types of tumors, notably 
squamous cell carcinomas and malignant lymphomas. However, widespread use of 
bleomycin congeners has been limited due to early drug resistance and the pulmonary 
toxicity that develops concurrent with administration of this drug. Thus, there is 

25 continuing effort to find novel small molecules with better clinical efficacy and lower 
toxicity. Bleomycin congeners are peptide/polyketide metabolites that function by 
binding to sequence selective regions of DNA and creating single and double stranded 
DNA breaks. Several in vitro and in vivo assays have shown that bleomycin induces 
apoptosis in eukaryotic cells (43-45). The biosynthetic gene cluster encoding for the 

30 production of bleomycin has recently been cloned from Streptomyces verticillus and 
is encoded on a contiguous 85 kb fragment (46). We propose to clone this pathway 
into a BAC vector to use as a source of apoptotic agents in eukaryotic cells. A library 
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will be made from the S. verticillus ATCC 15003 strain and cloned into the BAC 
vector, pBlumate2. As the sequence for this pathway is known, probes will be 
designed against sequences from the 5' and 3' ends of the pathway. The library will 
be introduced into E. coli and screened using colony hybridization with the probe 
5 generated against one end of the pathway. Positive clones will subsequently be 
screened with the second probe to identify which clone contains the entire pathway. 
Clones containing the complete pathway will be transferred into our optimized 
expression host S. diversa by mating. Expression of bleomycin will be detected using 
whole cell bioassays with Bacillus subtillis. 

10 Jurkat cells are the classic human cell line used for studies of 

apoptosis. The fluorescent Alexis 488 conjugate of annexin V (Molecular Probes) will 
be used as the marker of apoptosis in these cells. Annexin V binds to 
phosphotidylserine molecules normally located on the internal portion of the 
membrane in healthy cells. During early apoptosis, this molecule flips to the outer 

15 leaf of the membrane and can be detected on the cell surface using fluorescent 
markers such as the annexin V-conjugates. The bleomycin-induced apoptotic 
response in Jurkat cells will initially be characterized by varying both the 
concentrations of the exogenously administered drug and the incubation time with the 
drug. Alexis 488-annexin V will then be add to the cells and the level of fluorescence 

20 analyzed on the flow cytometer. Necrotic cell death will be determined using 
propidium iodide and the apoptotic population will be normalized to this value. 

Co-encapsulation of S. diversa with CHO cells within microdrops 
produced very similar results to the E. coli co-encapsulation. S. diversa grew well in 
the eukaryotic media and the CHO cell survival rate was high after 24 hours. In this 

25 experiment, the S. diversa clone expressing bleomycin will be co-encapsulated with 
the Jurkat cell line. S. diversa will be allowed to grow into a colony within the 
microdrop and begin production of bleomycin. The microdrops will be periodically 
analyzed over time for induction of apoptosis using the Alexis 488-annexin V 
conjugate on the microscope and flow cytometer. After noting the time for induction 

30 of apoptosis, a mixing experiment similar to that described for the EGF experiment 
will be performed. Bleomycin-expressing and non-expressing cells will be mixed 
together at ratios of 1 : 1 000 to 1:1 ,000,000. Co-encapsulation of the mixtures with 
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Jurkat cells will be performed and the appropriate incubation time maintained. These 
microdrops will then be stained with Alexis 488-annexin V and sorted on the flow 
cytometer. Confirmation of a positive bleomycin-expressing sorted clone will be 
performed by PCR amplification of a portion of the pathway. Again, it is likely that 
5 enrichment of these mixtures will be necessary using a few rounds of bulking sorting 
on the flow cytometer. 

If no apoptosis is observed in the initial assay, confirmation of 
bleomycin production will be performed by sorting of the encapsulated S. diversa 
clone into 1536 well plates. After a predetermined incubation period, the supernatant 

10 will be removed and spotted on filter disks for whole cell bioassays using the 
susceptible strain B. subtilis. Use of the 1536 well plates will hopefully avoid 
significant dilution of the antibiotic in the media. As cloning of the bleomycin 
pathway is quite recent, it has not yet been heterologously expressed from the 
complete pathway. However, Du et al demonstrated the heterologous bioconversion 

15 of the inactive aglycones into active bleomycin congeners by cloning a portion of the 
pathway into a S. lividans host (46). If bleomycin expression is not detectable in our 
assay, we will employ a similar strategy using our host strain S. diversa. If little 
bleomycin production is detected under these conditions, it will be necessary to 
optimize the culture conditions for S. diversa to induce pathway expression within the 

20 microdrop. On the other hand, if bleomycin is produced but apoptosis is not 

observed, it is possible that the molecule is diffusing away from the microdrop too 
quickly and it will be necessary to optimize the microdrop technology to concentrate 
the metabolite at the site of the reporter cell. 

Optimization of S. diversa secondary metabolite expression in microdrops 
25 Induction of pathway expression is an issue that is not limited to the 

bleomycin example. Bioactive small molecules within microorganisms are often 
produced to increase the host's ability to survive and proliferate. These compounds 
are generally thought to be nonessential for growth of the organism and are 
synthesized with the aid of genes involved in intermediary metabolism, hence the 
30 name "secondary metabolites." Thus, the pathways controlling expression of these 
secondary metabolites are often regulated under non-optimal conditions such as stress 
or nutrient limitation. As our system relies on use of the endogenous promoters and 

133 



09010-400001 (DIVER 1280-36) 

regulators, it might be necessary to optimize conditions for maximal pathway 
expression. 

There are several methods that can used to optimize for increased 
pathway expression within the microdrops. For easy detection of maximal 
5 expression, we will construct a transposon containing a promoter-less GFP. The 
enhanced GFP optimized for eukaryotes will be used as it has a codon bias for high 
GC organisms. Transposition into a known pathway (e.g., actinorhodin) will be done 
in vitro and the vector containing the pathway purified. The transposants will be 
introduced into an E. coli host, screened for clones that express GFP, and positive 

10 clones isolated on the flow cytometer. With the transfer of the promoter-less gene for 
GFP into the pathway, increased fluorescence within the cells would demonstrate 
transcription of the pathway using the endogenous promoters located within the 
pathway. This clone will be used as a tool for quick detection of upregulation in 
pathway expression due to changes in the experimental conditions. 

15 The S. diversa clone containing GFP and the actinorhodin pathway 

will be encapsulated in the microdrops and several different growth conditions will be 
tested, e.g., conditioned media, nutrient limiting media, known inducing factors, 
varying incubation times, etc. The microdrops will be analyzed under the microscope 
and on the flow cytometer to determine which conditions produce optimal expression 

20 of the pathway. These conditions will be verified for viability in eukaryotic cells as 
well. These optimized growth conditions will be confirmed using the bleomycin 
pathway to assess production of the secondary metabolite. Additionally, whole cell 
optimization of S. diversa is ongoing with production of strains that are missing 
different pleiotropic regulators that often negatively impact secondary metabolite 

25 production. As these strains are developed, they will be analyzed in the microdrops 
for enhanced pathway expression. 

The proximity of the two cell types within the microdrop should result 
in a high concentration of the bioactive molecule at the site of the reporting cell. 
However, if rapid diffusion of the molecule from the microdrop prevents detection of 

30 the desired signal, it will be necessary to optimize the microdrop protocol or develop 
a new encapsulation technology. Concentration of the molecule at the site of the 
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reporter cell could be achieved by a reduction in the microdrop pore size. Pore size 
reduction can be accomplished by one or a combination of the following approaches: 
(i) "plugging" the holes with particles of an appropriate size, which are 
held in the pores by non-covalent or covalent interactions; (ii) cross-linking of the 
microdrop-forming polymer with low molecular weight agents; (iii) creation of an 
external shell around the microdrop with pores of smaller size than those in the 
current microdrop. 

(i) Plugging the pores can be accomplished using polydisperse latexes 
with particles sized to fit within the pores of the microdrop. Latex 
particles may be modified on their surface such that they are attracted 
to the microdrop-forming polymer. For example, agarose-based 
microdrops cany a negative electrostatic charge on the surface. Thus, 
amidine-modified polystyrene latex particles (Interfacial Dynamics 
Corporation) will be attracted to the microdrop surface and the latex 
particles will effectively plug the microdrop pores provided that the 
charge density on the latex particles and the microdrop surface is high 
enough to sustain strong electrostatic bonds. 

(ii) Cross-linking of agarose beads can be achieved by treating them with 
various reagents according to known procedures (47). For our 
purposes, the cross-linking needs to occur only on the surface of 
microdrop. Thus, it may be advantageous to use polymers carrying 
reactive groups for cross-linking of agarose, such that permeation of 
the cross-linking agent inside the microdrop is prevented. 

(iii) Formation of classical (48) or polymerizable liposomes (49,50) around 
microdrops would provide a shell that could be an effective barrier 
even to small molecules. A wide variety of precursors for such 
liposomes as well as methods for their preparation have been reported 
(48-50) and most of them are applicable for our purposes. One of the 
possible limitations in choice of precursors stems from the intended 
use of microdrops for eventual screening by the flow cytometer. Thus, 
the liposomes should not absorb in the visible part of the spectrum. 
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It might also be necessary to use alternative methods and materials for 
preparation of the microdrops. Encapsulation of cells in polyacrylamide, alginate, 
fibrin, and other gel-forming polymers has been described (51). Another plausible 
candidate for encapsulation material is silica gel, which can be formed under 
5 physiological conditions with the assistance of enzymes (silicateins) (52) or enzyme 
mimetics (53). Additionally, various polymers may be used as the material for 
microdrop construction. Microdrops may be formed either upon polymerization of 
monomers (i.e. water-soluble acrylates or metacrylates) or upon gelation and/or cross- 
linking of preformed polymers (polyacrylates, polymetacrylates, polyvinyl alcohol). 

10 Since the formation of microdrops occurs simultaneously with encapsulation of living 
cells, such formation has to proceed under conditions compatible with cell survival. 
Thus, the precursors for microdrops (monomers or non-gelated polymers) should be 
soluble in aqueous media at physiological conditions and capable of the 
transformation into the microdrop material without any significant participation 

1 5 and/or emission of toxic compounds. 

Example 15: Identification of a Novel Bioactivitv or Biomolecule of Interest by Mass 
Spectroscopic Screening 

An integrated method for the high throughput identification of novel 
compounds derived from large insert libraries by Liquid Chromotography - Mass 
20 Spectrometry was performed as described below. 

A library from a mixed population of organisms was prepared. An 
extract of the library was collected. Extracts from the libraries were either pooled or 
kept separate. . Control extracts, without a bioactivity or biomolecule of interest were 
also prepared. 

25 Rapid chromatography was used with each extract, or combination of 

extracts to aid the ionization of the compound in the spectra. Mass spectra were 
generated for the natural product expression host (e.g. S. venezuelae) and vector alone 
(e.g.pJ0436) system. Mass spectra were also generated for the host cells containing 
the library extracts, alone or pooled. The spectra generated from multiple runs of 

30 either the background samples or the library samples were combined within each set 
to create a composite spectra. Composite spectra may be generated by using a 
percentage occurrence of an average intensity of each binned mass per time period or 
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by using multiple aligned single mass spectra over a time period. By using a 
redundant sampling method where each sample was measured several times in the 
presence of other extracts, the novel signals that consistently occurred within a sample 
extract but not within the background spectra were determined. 
5 The host-vector background spectrum was compared to the mass 

spectra obtained from large insert library clone extracts. Extra peaks observed in the 
large insert library clone extracts were considered as novel compounds and the 
cultures responsible for the extracts were selected for scale culture so the compound 
can be isolated and identified. 

10 Novel metabolite identification by mass spectroscopic screening . 

In integrated method for the high throughput identification of novel 
compounds derived from large insert libraries by LC-MS is described below. Liquid 
chromatography-mass spectrometry is used to determine the background mass spectra 
of the natural product expression host (e.g. S. diver sa DS10 or DS4) and vector alone 

1 5 (e.g.pmf 1 7) system. This host- vector background spectrum is compared to the mass 
spectra obtained from large insert library clone extracts. Extra peaks observed in the 
large insert library clone extracts are considered as novel compounds and the cultures 
responsible for the extracts are selected for scale culture so the compound can be 
isolated and identified. 

20 In order to create the background and sample spectra, rapid 

chromatography is used to aid the ionization of the compounds in the extract. The 
spectra generated from multiple runs of either the background samples or the library 
samples are combined within each set to create a composite spectra. Composite 
spectra may be generated by using a percentage occurrence of an average intensity of 

25 each binned mass per time period or by using multiple aligned single mass spectra 
over a time period. Using a redundant sampling method where by each sample is 
measured several times in the presence of other extracts the novel signals that 
consistently occur within a sample extract but not present in the background spectra 
can be determined. The purpose of this invention is to identify novel compounds 

30 produced by recombinant genes encoding biosynthetic pathways without relying on 
the compounds having bioactivity. This detection method is expected to be more 
universal than bioactivity for identifying novel compounds. 
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Currently there is a similar method of examining culture mixtures by 
LC-MS with long chromatographic times (30-60 min) to bring compounds to a fairly 
high level of purity. This method relies on molecular weight searches for de- 
replication of known compounds. This slow method would also work to identify 

5 novel compounds in S. diversa libraries however the throughput would be inadequate 
for the number of samples we need to screen. There are a pair of publications 
describing rapid direct infusion analysis of samples to identify fermentation 
conditions which improve the biosynthetic productivity of strains. This method does 
not identify specific compound, it just correlates greater, more complex production 

1 0 with different culture conditions. 
Shown below are the following: 



1 . Chromatographic gradient and mass spec conditions 

• HPLC and MS setting for Mass Spec Screening.TXT 

2. Pooling of samples sheet 
15 • SamplingStrategy.htm 

3. Sample flow using average method 

• Mass Spec Screening Flow chart.doc 

4. Matlab code for original average background 

• Mass Spec Screening Summary6 Matlab code.txt 

20 5. Matlab code under development for new single aligned 



peaks background determination for more accurate data analysis. 
• Mass Spec Screening 2nd Data Analysis Program.txt 
The method is best practiced with a set of control extracts and sample extracts. 
Mixing of the compounds in pools prior to analysis and deconvolution of the mixed 
25 extract pools will provide high throughput while maintaining the ability to measure 
each extract several times. 

A secondary screen may be required to eliminate false positives. 
This method is more specific for identifying potential novel compounds by molecular 
ion than current methods. This method uses a different data analysis strategy than the 
30 de-replication methods for the identification of specific peaks for new compounds in 
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extracts. Using the molecular ion as a signal to collect on this method may be coupled 
to mass based collection methods for the rapid isolation of compounds. 
Related references: 

"Rapid Method to Estimate the Presence of Secondary Metabolites in Microbial", 

Higgs, R.E.; Zahn, et al., Appl. Environ. Microbiol. 67:371-376. 

"Use of direct-infusion electrospray mass spectrometry to guide empirical 

development of improved conditions for expression of secondary metabolites from 

Actinomycetes", Zahn, et al., Appl. Envron. Microbiol. 67:377-386. 

"A general method for the de-replication of flavonoid glycosides utilizing high 

performance liquid chromatography mass spectrometric analysis." Constant, et al., 

Phytochemical analysis, 1997, 8:176-180. 

Method Information 

Gradient column analysis of crude extracts by 
positive ion mode. 



1100 Quaternary Pump 1 



Control 

Column Flow 

St op time 

Posttime 
Solvents 

Solvent A 

Solvent B 

Solvent C 

Solvent D 
PressureLimits 

Minimum Pressure 

Maximum Pressure 
Auxiliary 

Maximal Flow Ramp 

Primary Channel 



1.000 ml/min 
4 .00 min 

Off 

98.0 % (Water) 
0.0 % (MeOH) 
2.0 % (AcCN) 
0.0 % (iPrOH) 

0 bar 
400 bar 

100.00 ml/min A 2 
Auto 



139 



10 



09010-400001 (DIVER 1280-36) 

Compressibility 

Minimal Stroke 
Store Parameters 

Store Ratio A 

Store Ratio B 

Store Ratio C 

Store Ratio D 

Store Flow 

Store Pressure 
Agilent 1100 Contacts Option 



100*10 A -6/bar 
Auto 

Yes 
Yes 
Yes 
Yes 
Yes 
Yes 



15 



20 



25 



Contact 1 
Contact 2 
Contact 3 
Contact 4 
Timetable 

Time 



Open 
Open 
Open 
Open 



Solv.B Solv.C Solv.D 



0.00 


0.0 


2.0 


0.0 


0.01 


0.0 


2.0 


0.0 


0.30 


0.0 


95.0 


0.0 


1.50 


0.0 


95.0 


0.0 


1.60 


0.0 


2.0 


0.0 


4.00 


0.0 


2.0 


0.0 



Flow Pressure 
I | 

1.000 



Agilent 1100 Contacts Option Timetable 
Timetable is empty 

Agilent 1100 Diode Array Detector 1 

Signals 

Signal Store Signal, Bw Reference, Bw [nm] 
A: Yes 215 4 450 100 



30 
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B: No 254 4 

C: No 280 4 

D: No 250 16 

E: No 280 16 

Spectrum 

Store Spectra 

Range from 

Range to 

Range step 

Threshold 

Time 

Stoptime 
Posttime 

Required Lamps 

UV lamp required 
Vis lamp required 

Autobalance 

Prerun balancing 
Postrun balancing 



450 100 
450 100 

Off 

Off 

Apex + Baselines 

190 nm 

600 nm 
2 .00 nm 
1.00 inAU 

As pump 
Off 

Yes 
Yes 

Yes 
No 



Margin for negative Absorbance: 100 mAU 



Peakwidth 
Slit 

Analog Outputs 

Zero offset ana. out. 1 
Zero offset ana. out. 2 
Attenuation ana. out. 1 
Attenuation ana. out. 2 



0 . 1 min 

4 nm 

5 % 
5 % 

1000 mAU 

1000 mAU 



Mass Spectrometer Detector 



General Information 



Use MSD 



: Enabled 
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Ionization Mode 
Tune File 
StopTime 
Time Filter 
Data Storage 
Peakwidth 

Scan Speed Override 
Signals 



APCI 

atunes . tun 
as Pump 
Enabled 
Condensed 
0.15 min 
Disabled 



10 [Signal 1] 

Polarity : Positive 

Fragmentor Ramp : Disabled 

Scan Parameters 

Time I Mass Range (Frag- I 

15 (min) | Low | High (mentor | 



Gain I Thres- | Step- 
EMV | hold | size 



0.00 110.00 1500.00 70 
[Signal 2] 

Polarity : Positive 

20 Fragmentor Ramp : Disabled 

Scan Parameters 

Time I Mass Range 

(min) | Low | High 



1.0 



500 0.15 



I Frag- | Gain) Thres- | Step- 
|mentor| EMV I hold | size 



25 0.00 110.00 

[Signal 3] 

Not Active 

[Signal 4] 

Not Active 
30 Spray Chamber 



1500.00 110 



1.0 



500 0.15 



[MSZones] 
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Gas Temp 
350 C 
Vaporizer 
500 C 
DryingGas 
13.0 1/min 
Neb Pres 
60 psig 



350 C 



375 C 



3.0 1/min 
60 psig 



maximum 



maximum 



maximum 



maximum 



10 VCap (Positive) 
VCap (Negative) 
Corona (Positive) 
Corona (Negative) 



3000 V 
3000 V 
4.0 pA 
15 pA 



15 



FIA Series 



FIA Series in this Method 



Disabled 



20 Time Setting 

Time between Injections 



1.00 min 



25 ============ 

Agilent 1100 Column Thermostat 1 



30 Temperature settings 

Left temperature : 35.0°C 

Right temperature : Same as left 
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Enable analysis : When Temp, is within 

setpoint +/- 0.8°C 

Store left temperature : Yes 

Store right temperature: No 



Time 

Stoptime : As pump 

Posttime : Off 

Column Switching Valve : Column 2 

Timetable is empty 

During the process create a background file by looking for a certain percentage signal 
occurrence per mass unit. Use the Summary.m program to create this background 



spectra for use later in step 5 below. 



1 


Optional - Pool samples 


Use attached pooling strategy 


2 


Measure Data 


Use LC - MS to acquire data 


3 


Extract Data 


Extract mass spectra into xsv file 
format 


4 


Identify consistent signals in sample 
• deconvolute pools if sample 
pooling in step 1 was used. 


Compare same sample runs to each 
other,using Summary.m program, bin 
frequently/universally occurring signals 


5 


Determine Unique Peaks in Sample vs. 
Background 


1 . Convert percent occurrence per 
mass into a new sample spectra file. 

2. Use Massieve to deterermine 
unique peaks in all voltages and 
chromatographic fractions compared 
to background 

3. Create 'Unique Peaks' file for 
each voltage, chromatographic peak 
comparison. 
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6 


Eliminate extra peaks by taking 
advantage of multiple MS detection 
channels and chromatographic conditions. 


Feed 'Unique Peak' file for each sample 
back into Summary.m program, keep 
peaks that show up in more then one 
Mass spectrometer channel or 
chromatographic peak. 


7 




Short list of novel compound signals 



clear 
dir 

CompressCount=l; 
TestFileData=[12 34 45 56 67] 

MasterDir=' C : \HPCHEM\1\DATA\MS20FEBA\IND4TST ' ; % User inputed directory 

containing other directories with files 
cd(MasterDir) ; 

MasterDirFiles = dir % Load all files in master directory to one variable. 

TotalFiles = size (MasterDirFiles) 
Original Files=' Original Files'; 
X=990099~ 

% Loop to create compressed directory listing containing only directories, 
for ExtractDir=l: TotalFiles (1, 1) 

% Look through find directories in master directory 
if MasterDirFiles (ExtractDir) .isdir==l % Test each 

dir item to see if it is a directory 

Is_Original_Files=strcmp (MasterDirFiles (ExtractDir) . name, Original_Files) ; 
if not (Is_Original_Files) 

CompressedDirList (CompressCount) .name *= MasterDirFiles (ExtractDir) .name; % 
assign new directories. 

CompressCount=CompressCount+l; 
% Increment count compressed directories 
end 

end 

end 

CompressCount 

TotalDirectories=size (CompressedDirList) ; 
CompressCount=l; 

for CompressCount^ 3 :TotalDirectories (1, 2) % Main loop for moving in and out of 
directories . 

CurrentDirectory = CompressedDirList (CompressCount) .name; 
cd (CurrentDirectory) ; 
FileNameStub=char (pwd) 

% Loop to replace backslash in directory names to dash so directory names can be 
labels 
i=0; 

FileNameLength= size (FileNameStub) 
for i=l:FileNameLength{l,2) 
if FileNameStub <i,i)==«V 
FileNameStub ( 1 , i ) = ■ - 1 

end 

end 



ListOf CsvFiles=dir ('*. csv' ) 

PrintHistograms=0; % 1 means print histogram, 0 means no print. 
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% Whether they are 

printed or not the files will be saved. 

spectra= [ ] ; % 
Clear spectra 

mass=109.8 % 
Initial starting mass. 

CutoffPercent=40; % Cutoff 

percent to check if peak is consistently present 

spectra=dlmread (ListOfCsvFiles (1) .name) ; % Loads first item in dir call into 

spectra 

sizespectra=size (spectra) ; % Determines size of first spectra 

loaded. 

master^ [ ) ; d=l ; SignalOne« [ ] ; SignalTwo= [ ) ; 
endspectra=0; 

format compact % Output 

form for any variables displayed during run. 

BiggestSpectra=0; % Initialize the 

biggest spectra in batch 

BiggestObsMass=0; % 
Intitialze the Biggest Observed mass in any spectra 

FileNameRoot=( '-Names. csv' ) ; 

% Routine to sort filenames into alphabetical order - should correspond to 
chronological order for 

% individual mass spectra. 

SizeDirList = size (ListOfCsvFiles) ; 

for FileNameOrder = 1 : SizeDirList (1, 1) 

DataFileName (FileNameOrder, :) = ListOfCsvFiles (FileNameOrder) .name 

end 

SortedDataFileName = sortrows (DataFileName) 



% Routine to prepare NameFile.Csv file for writing 
FileNames=strcat (FileNameStub, FileNameRoot) ; % Create full filename as a 

variable . 

NameFile=fopen(FileNames, 'a+' ) % Open file 

to record filenames used to create master matrix 
NameOut=char ( 'Mass ' ) ; 

f print f (NameFile,NameOut) ; fprintf (NameFile, ' \n' ) ; % Prints headerline of name 

file 

% loop to determine largest measured mass and to write filenames in output 

files 

% to allow matching filenames and columns from directory lists imported into 
summaryl 

for testlength=l: SizeDirList (1, 1) 
spectra=dlmread{ SortedDataFileName (testlength, : ) ) ; 
sizespectra=size (spectra) ; 
if sizespectra (1, 1) >BiggestSpectra 
BiggestSpectra=sizespectra (1, 1) ; 
end 

if spectra (sizespectra ( 1 , 1 ) , 1 ) >BiggestObsMass 
BiggestObsMass=spectra (sizespectra (1, 1) , 1) ; 
end 

OddCol=( (testlength*2)+l) ; 
EvenCol=testlength*2 ; 
Name (OddCol ) =cellstr ( ' X ' ) ; 
Name (EvenCol) =cellstr (SortedDataFileName (testlength, : ) ) ; 
NameOut=char (Name (EvenCol) ) 
Spacer=char (Name (OddCol) ) 

fprintf (NameFile, NameOut ) ; fprintf (NameFile, ' \n ' ) ; % Writes even rows 
filenames, with linebreak between. 

fprintf (NameFile, Spacer) ; fprintf (NameFile, ' \n' ) ; % Writes odd row with the 
spacer, with a linebreak between. 

end 
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f close (NameFile) ; 
Name(l)=cellstr ('Mass') ; 



% Close the file with the file names. 



for i=l : (BiggestObsMass - 100) 
high mass value 

master (i, 1) =mass; 
of master with mass units 

mass=mass+l; 

end 



%loop to fill master matrix from 100 to 

%fills in the first column 



for d=l:SizeDirList(l,l) 



% loop to bin spectral intensities into master 



matrix 



spectra=dlmread(SortedDataFileName (d, : ) ) ; % reads current file in to variable 
spectra 

mass=109.8; % Re initialize starting point 

sizemaster=size (master) ; 
mcol=d*2 ; 

si zespectra=size {spectra) ; 

% Print current index and current filename being operated on 
d 

FileNameStub 
SortedDataFileName <d) 

PreviousMass=0 ; 
Previous lntensity=0; 

MaxColmlntensity (l,mcol) =0; %Sets column intensity to zero so a comparison 
can be made. 

MaxColmlntensity (l,mcol+l) =0; %Sets column intensity to zero so a comparison 
can be made. 

for i=l: sizemaster (1, 1) % loop that goes through every row of 

master, adding columns as spectral data is read 

endspectra=0; 

while spectra (j,l) < (mass+1) & endspectra==0 % loop that checks if there is 
a data point at a mass 



intensity=spectra (j, 2) ; 
Masstab files 

smass='spectra (j, 1) ; 
column 1 of Masstab files. 



% Mass signal intensity is in column 2 of 
% m/z value for each mass is in 



% InBin = Logical variable to determine if the current mass is in a bin 
InBin=( (smass>=mass) & (smass < (mass+1) ) & (intensity >0) ) ; 
% InSameBin = Logical variable to determine if there is a second signal 
at the same mass as the previous one 

InSameBin= (PreviousMass>=mass & PreviousMass < (mass+1)) 

& (PreviousIntensity>0) ; 

if InBin & -InSameBin % see the mass for the first time 

- generates SignalOne 

master (i,mcol)=spectra ( j, 2) ; 

if intensity > MaxColmlntensity (l,mcol) % determine largest value per 

column 

MaxColmlntensity (l,mcol) -intensity; 
MaxColmlntensity for later use. 
end 



% and store it in 



end 

if InSameBin & InBin % see the mass for the second time, 
master (i, (mcol+1) ) =spectra (j, 2) ; 
assign mass to master matrix in second signal column 
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if intensity > MaxColmlntensity (l,mcol+l) % determine largest value 
per second signal column 

MaxColumlntensity (l,mcol+l) =intensity; % and store 

it in MaxColmlntensity for later use. 

end 

end 

j=j+l; % this may not be working as I had hoped - should be comparing 



if j>sizespectra (1, 1) % Do not look for more masses once the position 
been reached 
endspectra=l; 
j=j~2; 

if j==0 % prevents j from being set to zero and putting spectra 

j-i; 

end 

end 



PreviousMass=smass; 
Previous Intensity-intensity; 
end 

mass=mass+l; 

end 

end 
mass 

OutputRoot=char { ' -output .csv' ) ; 
Output_File=s treat <FileNameStub,OutputRoot) ; 

dlmwrite (Output_File, master ) ; % Write master matrix to file. 

sizemaster=size (master) ; 

SignalOne ( 1 , 1 ) =0 ; 
SignalTwo ( 1 , 1 ) =0 ; 

Even= ' Even ' ; 
Odd='Odd f ; 

SignalOneNormalizedExists=0; 
SignalTwoNormalizedExists=0; 

% Loop to sort out the two signals into the SignalOne and SignalTwo matrices. 
% Will also create the relative intensity matrices SignalOnePercent and 
SignalTwoPercent 

% so that the signals can be analyzed on a relative intensity basis. 

for d=l: sizemaster (1, 2) % Go through full length of the master 

matrix. 

d; 

for i=l: (BiggestObsMass - 100) % Go through all the masses, 
i; 

Halfd=d/2; 
master (i,d) ; 

% Put in the mass labels down the first column of the seperates signal files . 
SignalOne (i, 1) =master (i, 1) ; 
SignalTwo (i, l)=master (i, 1) ; 
SignalOnePercent (i, 1 ) =master (i, 1) ; 

SignalTwoPercent (i, l)=master(i / 1) ; 

if Hal fd==round (Half d) % Put the even rows in SignalOne 
Comprsd_even_d= (d/2 ) +1; 
SignalOne (i, Comprsd_even_d) =master {i, d) ; 

if MaxColmIntensity(l,d) ~=0 % Determine relative intensities of first 

signal . 

SignalOnePercent (i, Comprsd_even_d) =master (i, d) /MaxColmlntensity (1, d) *100; 



mass units, 
in master has 



out of range 
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SignalOneNormalizedExists=l; % Flag to prevent SignalOnePercent save 

if empty 

end 
%Even 

end 

if Halfd~=round(Halfd) %Puts the odd rows in SignalTwo 
Comprsd_odd_d=round(Halfd) ; 
% size_signal_2=size (SignalTwo) ; 

if d <= sizemaster(l,2) % prevents out of range in master because of 
missing signal 2 column 

SignalTwo (i, Comprsd_odd_d) =master (i, d) ; 

if MaxColmIntensity(l,d)~=0 % Determine relative intensities of 

second signal. 

SignalTwoPercent ( i , Comprsd_odd_d) =master { i , d) /MaxColmlntensity ( 1 , d) *100 ; 

SignalTwoNormalizedExists=l; % Flag to prevent SignalOnePercent 

save if empty 

end 

%Odd 

end 

end 

end % i = 
end % d= 

SignallRoot=char ( '-SignalOne-output.csv' ) ; 
Signal_l_Jile=strcat {FileNameStub, SignallRoot) ; 

dlmwrite (Signal_l_File, SignalOne) ; % Write first signal data file. 

Signal2Root=char ( ' -SignalTwo-output . csv' ) ; 
Signal_2_File=strcat (FileNameStub, Signal2Root) ; 

dlmwrite <Signal_2_File, SignalTwo) ; % Write second signal data file, 

if SignalOneNormalizedExists 

Normal lRoot=char ( ■ -Normal-SignalOne-output . csv' ) ; 

Normal_l_File=strcat ( FileNameStub, NormallRoot ) ; 
dlmwrite (Normal^^File, SignalOnePercent) ; % Write first signal 

relative (normalized) data file, 
end 

if SignalTwoNormalizedExists 

Normal2Root=char ( ' -Normal-SignalTwo-output . csv' ) 
Normal_2_File=strcat( FileNameStub, Normal2Root) ; 

dlmwrite (Normal_2_File, SignalTwoPercent); % % Write second signal 

relative (normalized) data file, 
end 

% Procedure to create percentage occurrence summaries and to send out 
histograms of backgrounds. 

size_signal_l=size{SignalOne) ; 
size_signal_2=size (SignalTwo) ; 

ZeroPercent=0; 
TwoFivePercent=2 . 5; 
FivePercent=5 ; 

for row=l:size_signal_l (1, 1) % Main loop to create counts at certain 

frequencies . 

row 

FileNameStub 

GreaterThanZero=0; %Initialize each counter per row. 

GreaterThanTwoFi ve=0 ; 
GreaterThanFive=0 ; 

for colm=2:size_signal_l (1,2) 

%colm 

% Count number of times a signal intensity occurs per mass unit, 
if SignalOnePercent (row, colm) > ZeroPercent 
GreaterThanZero=GreaterThanZero+l; 

end 
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if SignalOnePercent (row,colm) > TwoFivePercent 
GreaterThanTwoFive=GreaterThanTwoFive+l; 
end 

if SignalOnePercent (row, colm) > FivePercent 
GreaterThanFive=GreaterThanFive+l; 
end 

end % end column for loop 

% Determine percent times there is a signal per mass 
% First column of Summary=mass index, 

% Columns 2-4 of Summary = percent occurence of intensity. 
% Columns 5-7 of Summary = Greater than PercentCutof f Occurrence of signals per 

run. 

if SignalOneNormalizedExists 
Summaryl (row, 1) =master ( row, 1) ; 

Summaryl(row,2)=GreaterThanZero/ (size_signal_l (1, 2) -1) *100; 
Summaryl (row, 3) =GreaterThanTwoFive/ {size_signal_l {1, 2 ) -1) *100; 
Summaryl (row, 4) =GreaterThanFive/ (size_signal_l (1, 2) -1) *100; 

TwoCol Summary (row, l)=master (row, 1) ; 

if Summaryl (row, 2) >=Cutoff Percent 
Summaryl ( row, 5 ) =1 ; 
TwoColSummary (row, 2) =1; 

else 

Summaryl (row, 5) =0; 

TwoColSummary ( row, 2 ) =0 . 01 ; 
end 

if Summaryl (row, 3) >=Cutoff Percent 
Summaryl (row, 6)=1; 
else 

Summaryl ( row, 6 ) =0 ; 

end 

if Summaryl (row, 4) >=Cutoff Percent 

Summaryl ( row, 7 ) =1 ; 

else 

Summaryl (row, 7) =0; 
end 

end % of if statement 
end % end row for loop. 

% Routine to write 6 col and 2 col summary file of peak occurrence, 
if SignalOneNormalizedExists 
SummaryRoot=char ( ' -SignalOne-Summary . csv 1 ) ; 

SummaryFile=strcat (FileNameStub, SumraaryRoot) ; 
dlmwrite (SummaryFile, Summaryl) ; 

TwoColSummaryRoot=char ( ' -SignalOne-TwoColSummary . csv' ) ; 
TwoColSummaryFile^strcat (FileNameStub, TwoCol SummaryRoot ) ; 

% Use fprintf file save method to enter zeros into csv files. 
TwoColSummaryFileOpen = f open (TwoColSummary File, 'a*') 
TwoColLength = size (TwoColSummary) ; i=0; 

for TwoColLength (1,1) 

fprintf (TwoColSummaryFileOpen, * %f %c %f\r', 
TwoColSummary ( i , 1 ) , ' , ' , TwoColSummary ( i , 2 ) ) ; 
end 

% fprintf (TwoColSummaryFileOpen, '\n' ) 

f close (TwoColSummaryFileOpen) ; 
%dlmwrite (TwoColSummaryFile, TwoColSummary) ; 

end 

%Create histograms showing binning of percentage occurence, in 5 percent 
divisions . 

if SignalOneNormalizedExists 

figured) ; hist (Summaryl (:, 2) , 20) ; 
OverZero=' Occurence over 0% — '; 
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FigureTitle=char { 0% histogram'); 
TitleWord (1, : ) =cellstr (OverZero) ; 
TitleWord (2, :) =cellstr (FileNameStub) ; 
xlabel ( * Percent Occurrence ' ) ; 
5 ylabel ( 'Counts ') ; 

title (TitleWord) ; 

if PrintHistograms==l 

print 

end 

10 FileName=strcat (FileNameStub, FigureTitle) ; 

print { ' -djpeg 1 , ' -r200 ' , FileName) 

figure (2) ;hist (Summaryl ( : , 3) , 20) ; 

OverTwoFive=' Occurence over 2.5% intensity 
15 FigureTitle=char ( '- 2.5% histogram'); 

TitleWord (1, : ) =cellstr (OverTwoFive) 

TitleWord (2, : )=cellstr (FileNameStub) ; 

xlabel ( ' Percent Occurrence ' ) ; 

ylabel ( 'Counts ' ) ; 
20 title (TitleWord) ; 

if PrintHistograms==l 

print 

end 

FileName=strcat (FileNameStub, FigureTitle) ; 
25 print ( • -djpeg ' , ' -r200 \ FileName) 



figure (3) ;hist (Summaryl ( : , 4) , 20) ; 

OverFive=' Occurence over 5% intensity — '; 
30 FigureTitle=char ( 5% histogram'); 

TitleWord (1, : ) =cellstr (OverFive) 

TitleWord (2, : ) =ce lis tr (FileNameStub) ; 

xlabel ( ' Percent Occurrence ' ) ; 

ylabel ( 'Counts ' ) ; 
35 title (TitleWord) ; 

if PrintHistograms==l 

print 

end 

FileName=strcat (FileNameStub, FigureTitle) ; 
40 print ( ' -djpeg ' , ' -r200 ' , FileName ) 



% Create bar graphs showing positions observed more than 50% of the 



time vs mass. 



45 figure (4) ; bar (Summaryl ( : , 1) , Summaryl ( : , 5) ) ; 

Over Zero2=* Greater than 50% occurrence of signal over 0% — 
FigureTitle=char ( '- 50% - 0% intensity'); 
TitleWord (1, : )=cellstr (OverZero2) 
TitleWord (2, : ) =cellstr (FileNameStub) ; 
50 xlabel ( 'Mass' ) ; 

ylabel ( ' Percent Occurrence ' ) ; 
title (TitleWord) ; 
if PrintHistograms==l 
print 
end 

FileName=s treat (FileNameStub, FigureTitle) ; 
print ( '-djpeg' , '-r200' , FileName) 



55 



60 figure (5) ;bar (Summaryl ( : , 1) , Summaryl ( : , 6) ) ; 

OverTwoFive2=* Greater than 50% occurrence of signal over 2.5% 

FigureTitle=char ( 50% - 2.5% intensity' ); 

TitleWord (1, : ) =cellstr (OverTwoFive2 ) 

TitleWord (2, ; ) =cellstr (FileNameStub) ; 
65 xlabel ('Mass') ; 

ylabel ( * Percent Occurrence ' ) ; 

title (TitleWord) ; 

if PrintHistograms==l 

™ Print 
70 end 

FileName=strcat ( FileNameStub, FigureTitle ) ; 

print ( '-djpeg' , '-r2 00' , FileName) 
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figure ( 6 ) ; bar ( Summaryl ( : , 1 ) , Summaryl { : , 7 ) ) ; 
OverFive2= 'Greater than 50% occurrence of signal over 5% — 
FigureTitle-char ( • - 50% - 5% intensity'); 
5 TitleWordd, : ) =cellstr (0verFive2) 

TitleWord (2, : ) =cellstr (FileNameStub) ; 
xlabel ( 'Mass ' ) ; 
ylabel ( * Percent Occurrence ' ) ; 
title (TitleWord) ; 
10 if PrintHistograms==l 

print 
end 

FileName=strcat (FileNameStub/ FigureTitle) ; 
print ( '-djpeg' , ' -r200 ' , FileName) 



15 



60 



% Create percent occurrence vs mass bar graph across all masses. 



figure (7) ; bar (Summaryl ( : , 1) , Summaryl ( : , 2) ) ; 
20 OverZero3= ' Percentage occurrence of signal over 0% — ' ; 

FigureTitle=char ( occur per mass at 0 percent'); 

TitleWordd, : ) =cellstr (OverZero3) 

TitleWord (2 , : ) =cellstr (FileNameStub) ; 

xlabel ( 'Mass' ) ; 
25 ylabel ( ' Percent Occurrence 1 ) ; 

title (TitleWord) ; 

if PrintHistograms=~l 

print 

30 FileName^s treat (FileNameStub, FigureTitle) ; 

print ( '-djpeg' , • -r200 ' , FileName) 

figure ( 8 ) ; bar ( Summaryl ( : , 1 ) , Summaryl ( : , 3 ) ) ; 

OverTwoFive3=' Percentage occurrence of signal over 2.5% — 
35 FigureTitle=char ( '- occur per mass at 2.5 percent'); 

TitleWordd, : )=cellstr (OverTwoFive3) 

TitleWord(2, : ) =cellstr (FileNameStub) ; 

xlabel ('Mass'); 

ylabel ( ' Percent Occurrence 1 ) ; 
40 title (TitleWord) ; 

if PrintHistograms— 1 

print 

end 

FileNarae=strcat (FileNameStub, FigureTitle) ; 
45 print ( ' -djpeg ' , ' -r200 ' , FileName) 

figure (9) ;bar (Summaryl ( : , 1) , Summaryl ( : , 4) ) ; 

OverFive3= ' Percentage occurrence of signal over 5% — '; 

FigureTitle=char ( occur per mass at 5 percent*); 
50 TitleWordd, : ) =cellstr (OverFive3) 

TitleWord (2, :) =cellstr (FileNameStub) ; 

xlabel ( 'Mass' ) ; 

ylabel ( 1 Percent Occurrence ' ) ; 

title (TitleWord) ; 
55 if PrintHistograms==l 

print 

end 

FileName=strcat (FileNameStub, FigureTitle) ; 
print ( ' -djpeg ' , ' -r200 ' , FileName) 



end % of if SignalOneNormalizedExists statement. 



%Return to matlab directory 
%cd C:\matlabrll\work 
65 %to_ds 
%pwd 



dlmwrite { ' FILE . txt ' , TestFileData) 
70 cd ..; 



X % prints after while 
end % Main loop for moving in and out of directories. 
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% Alinel . ra 

% 

% The program determines the average background value looking at the entire peak shape 
of the spectra. 

% Will need another program to take the measured spectra of true samples and compare 
them to the average 

% values of the average spectra determined here and the see if they fall within a 

certain percentage of the 

% RMSD values to see if they are correct. 

clear 
dir 

Compres sCount =1; 

TestFileData={12 34 45 56 67] %Test data for file written as test of program - remove 
later 

MasterDir= * C : \MATLABRll\work\TestData ' ; % User inputed directory containing other 

directories with files 
cd(MasterDir) ; 

MasterDirFiles » dir % Load all files in master directory to one variable. 

TotalFiles = size (MasterDirFiles) 
Original_Files- ' Original Files ' ; 
X=99099 

% Value used to show completion of loop. 

% Loop to create compressed directory listing containing only directories, 
for ExtractDir=l: TotalFiles (1,1) 

% Look through find directories in master directory 
if MasterDirFiles (ExtractDir) .isdir==l % Test each 

dir item to see if it is a directory 

Is_Original_Files=strcmp (MasterDirFiles (ExtractDir) .name, Original_Files) ; 
if not (Is_Original_Files) 

CompressedDirList ( Compres sCount) .name = MasterDirFiles (ExtractDir) . name; % 
assign new directories. 

CompressCount=CompressCount+l; 
% Increment count compressed directories 
end 

end 

end 

TotalDirectories=size (CompressedDirList) ; 
Compres sCount=l; 

for CompressCount= 3 :TotalDirectories (1, 2) % Main loop for moving in and out of 
directories. 

CurrentDirectory = CompressedDirList ( Compres sCount) .name; 
cd(CurrentDirectory) ; 
FileNameStub=char (pwd) 

% Loop to replace backslash in directory names to dash so directory names can be 
labels 
i=0; 

FileNameLength= size (FileNameStub) 
for i=l:FileNameLength(l,2) 
if FileNameStubd,!)^^' 
FileNameStub ( 1 , i ) « • - ' 

end 

end 



ListOfCsvFiles=dir ( » * . csv' ) 



Spectra=[]; % 
Clear Spectra 

mass=109.8 % 
Initial starting mass. 

Spectra=dlmread(ListOfCsvFiles (1) -name) ; % Loads first item in dir call into 

Spectra 
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sizespectra=size (Spectra) ; % Determines size 

of first Spectra loaded. 

% master=[] ;d=l;SignalOne=[] ; SignalTwo= [ ] ; % Clear master, SignalOne, 

SignalTwo 

endspectra«0; 

format compact % Output 

form for any variables displayed during run. 

BiggestSpectra=0; % Initialize the 

biggest spectra in batch 

BiggestObsMass=0; % 
Intitialze the Biggest Observed mass in any spectra 

FileNameRoot= ( ' -Names . csv • ) ; 



% Routine to sort filenames into alphabetical order - should correspond to 
chronological order for 

% individual mass spectra. 

SizeDirList = size (ListOfCsvFiles) ; 

for FileNameOrder = 1 : SizeDirList (1, 1) 

DataFileName (FileNameOrder, : ) = ListOfCsvFiles (FileNameOrder) .name 

end 

SortedDataFileName = sortrows (DataFileName) 

% Routine to prepare NameFile.Csv file for writing 

FileNames=strcat (FileNameStub, FileNameRoot) ; % Create full filename as a 

variable. 

NameFile=fopen(FileNames, 'a+' ) % Open file 

to record filenames used to create master matrix 
NameOut=char ( ' Mass * ) ; 

fprintf (Name File, NameOut) ; fprintf (NameFile, ' \n* ) ; % Prints headerline of name 

file 



% loop to determine largest measured mass and to write filenames in output 

files 

% to allow matching filenames and columns from directory lists imported into 

Aline 

for testlength- 1 : SizeDirList (1, 1) 
Spectra=dlmread( SortedDataFileName (testlength, : ) ) ; 
sizespectra=size (Spectra) ; 
if sizespectra (1, 1) >BiggestSpectra 
BiggestSpectra=sizespectra (1, 1) ; 
end 

if Spectra (sizespectra (1, 1) , 1 ) >BiggestObsMass 
BiggestObsMass=Spectra (sizespectra (1, 1) ,1) ; 
end 

OddCol=( (testlength*2)+l) ; 
EvenCol=testlength*2; 
Name (OddCol) =cellstr ( 'X' ) ; 
Name (EvenCol) =cellstr (SortedDataFileName (testlength, : ) ) ; 
Name Out=char (Name (EvenCol) ) 
Spacer=char (Name (OddCol) ) 

fprintf (NameFile, NameOut) ; fprintf (NameFile, 1 \n' ) ; % Writes even rows 
filenames, with linebreak between. 

fprintf (NameFile, Spacer) ; fprintf (NameFile, 1 \n' ) ; % Writes odd row with the 
spacer, with a linebreak between. 

end 

f close (NameFile) ; % Close the file with the file names. 

Name(l)=cellstr ( 'Mass') ; 

%loop to fill first column of matrices from 100 to high mass value with the 
mass labels. 

for i=l: (BiggestObsMass - 100) 
MaxPositionMaster (i, l)=mass; 
AverageMaxPos (i, l)=mass; 
TruncAverageMaxPos (i, l)*mass; 
MaxPosDifference (i, 1) =mass; 
MasterMeanShiftedSpectra (i, 1) - mass; 
MasterStDevShiftedSpectra (i, l)=mass; 

mass=mass+l; 

end 
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%%%%%%%%%%%%%%%%%%%%%% MAIN LOOP TO ORGANIZE ROWS OF MASSES FROM DIFFERENT FILES 
%%%%%%%%%%%%%%%%%% 
% Main loop to: 

% 1) Read data row by row into master matrix 
% 2) Determine first maxima of each peak 
% 3) Determine average max position for each mass 
% 4) Determine amount to shift each spectra 

% 5) Shift each spectra the appropriate amount to align the maxima 
% 6) Determine the mean spectra by averaging intensity at each point. 
% 7) Determine the standard deviation between the measured spectra and the average. 
% 8) Record the row by row averages and RMSD's into a master matrix for saving to 
files at the end. 

for MassPosition = 1: (BiggestObsMass-100) 

%Loop to open each file and read values into MasterMassRowMatrix 
%Item 1 above 

for FileNumber = 1 : SizeDirList (1, 1) 
Spectra=[] ; 

% Clear spectra for new values 

from next file. 

Spectra ■ dlmread (SortedDataFileName (FileNumber, :)) ; % Read 

spectra sequentially for MasterMassPerRow 

% Need a line here to test that we are not past the end of the file - test at start 
with constant width files. 

SizeCurrentSpectra = size (Spectra) ; 
if MassPosition <= SizeCurrentSpectra (1, 1) 
MasterMassPerRow (FileNumber, : ) = 
Spectra (MassPosition, 2 : SizeCurrentSpectra (1, 2) ) ; % transfer row to master matrix 
else 

MasterMassPerRow (FileNumber, : ) = 0; 
end % FileNumber else 

end 



%%%%%%%%%%%%%%%%% 

%%%%% May have to insert a routine to generate a zerofilled rectangular maxtrix 
for later manipulations . 
%%%%%%%%%%%%%%%%%% 



SizeMasterMassPerRow = size (MasterMassPerRow) ; 



% Find position of first maxima in the current files. 
% Item 2 of above 

for CurrentFile = 1 : SizeMasterMassPerRow ( 1, 1) % go through rows one by 

NoPeak = 1; 

% Set marker for no maxima 



PosMarker = 2 

% Start Current colm position after the mass labels. 

% Item 1 from top of loop 
while NoPeak 

% loop continues until the first max is found in each row 

YesPeak = 0 

% Set YesPeak to negative at start of scan. 
CurrentPosValue = MasterMassPerRow (CurrentFile, PosMarker) ; % set the 
current position as the center value 

if PosMarker > 2 

PreviousPosValue = MasterMassPerRow (CurrentFile, PosMarker-1) ; % Get 
previous position value during scan, 
else 

PreviousPosValue =0; % if at beginning of row 

let every signal start with a zero value 
end % end if PosMarker >2 



if PosMarker ■- SizeMasterMassPerRow (1, 2 ) 

NextPosValue = MasterMassPerRow (CurrentFile, PosMarker ) % if at end of 
row set next value to current value 

NoPeak=0; % Jump out if at the end of the row. 
else 
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NextPosValue - MasterMassPerRow (CurrentFile, PosMarker+1) ; 
end % End of if PosMarker at end 

%Determine if these three points describe a peak. 
% YesPeak = logical variable to see if CurrentPos is top of peak. 
YesPeak = (PreviousPosValue < CurrentPosValue) & (CurrentPosValue > 
NextPosValue) ; 

if YesPeak 

% Record position of maximum in Master MaxPos Matrix 

% Rows are masses; columns are FileNumber positions 
% Offset CurrentFile by 1 b/c first col'm is the mass label. 

MaxPositionMaster (MassPosition, CurrentFile+1) = PosMarker; 
NoPeak =0; % 
Set NoPeak so while loop can end and can check next row. 
end % of if YesPeak 

PosMarker = PosMarker+1; % Increment Pos 

Marker to next position. 

if PosMarker > SizeMasterMassPerRow (1, 2) 

NoPeak = 0; 
end % if PosMarker 

end % While NoPeak. 
end % CurrentFile for loop 



% Item 3 - Determine the average position of maxima for each mass 

SumMaxPos=0; 

for Avelndex = 2 : {SizeMasterMassPerRow {1, 1) +1) 

SumMaxPos = SumMaxPos+MaxPositionMaster (MassPosition, Avelndex) ; 
end % for Avelndex 

TruncAverageMaxPos (MassPosition, 2)= f ix (SumMaxPos/SizeMasterMassPerRow(l, 1) ) ; 
% Item 4 from top of the MassPosition loop 

% If a peak is forward (smaller pos #) of the average maxima then the shift is 
positive, 

% if the peak is behind the average maxima then the shift is negative, 
for Avelndex = 2 : (SizeMasterMassPerRow (1, 1) +1) 

MaxPosDifference (MassPosition, Avelndex) =MaxPositionMaster (MassPosition, Avelndex) - 
TruncAverageMaxPos (MassPosition, 2) ; 
end % for Avelndex 2nd time. 



% Determine the largest positive and negative shift that needs to be made 
% Continuation of item 4. 

SizeMaxPositionMaster=size (MaxPositionMaster) ; 

Larges t Pos i ti veShi f t-0 ; 

LargestNegativeShif t=0 ; 

for i= 2 :SizeMaxPositionMaster (1, 2) 

if MaxPosDifference (MassPosition, i) > LargestPositiveShift 
LargestPositiveShif t = MaxPosDifference (MassPosition, i) 

end 

if MaxPosDifference (MassPosition, i) < LargestNegativeShift 
LargestNegativeShif t = MaxPosDifference (MassPosition, i) 

end 

end % for i loop. 

% Item 5 - Shift the spectra depending on the position of their maxima. 
% Fill the ShiftedSpectra matrix with the appropriately shifted spectra from 
MasterMassPerRow. 

ShiftedMatrixWidth = 
LargestPositiveShif t+abs (LargestNegativeShift) +SizeMasterMassPerRow (1, 2) ; 

ShiftedSpectra = zeros (SizeMasterMassPerRow (1, 1) , Shif tedMatrixWidth) ; % 
zero fill new shifted spectra matrix 

SizeMaxPosDif f erence= size (MaxPosDifference) ; 
for Shift - 2:SizeMaxPosDifference (1, 2) ; 

Startlndex = 1+LargestPositiveShif t-MaxPosDif ference (MassPosition, Shift) ; 

FinalPosition = Startlndex+SizeMasterMassPerRow (1, 2) -1; 

FileNumber=Shift-l; 

MasterMassIndex « 1; 

for Index = Startlndex: FinalPosition 
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10 



Shif tedSpectra (FileNumber , Index) =MasterMassPerRow (FileNumber, MasterMassIndex) ; 
MasterMassIndex=MasterMassIndex+l; 
end % Index loop 
end % Shift loop 

% Item 6 - Create average intensity spectra for each row. 
SizeShif tedSpectra=size {Shif tedSpectra) ; 
MeanShif tedSpectra=mean (Shif tedSpectra) ; 

% Item 7 - Determine Standard Deviation for each column of aligned spectra 
StDevShiftedSpectra=std (Shif tedSpectra) ; 



% Item 8 - Record the average shifted spectra per mass and the standard dev per 
15 position. 

MasterDim = size (ShiftedSpectra) ; 
MasterColWidth = MasterDim (1, 2) +1; 

MasterMeanShif tedSpectra (MassPosition, 2 : MasterColWidth) =MeanShif tedSpectra (1, : ) ; 
MasterStDevShif tedSpectra (MassPosition, 2 : MasterColWidth) = 
20 StDevShiftedSpectra {:,:); 

dlmwrite ( 'MasterMeanShif tedSpectra . csv' , MasterMeanShif tedSpectra) ; 
dlmwrite { 'MasterStDevShif tedSpectra . csv' , MasterStDevShif tedSpectra) ; 

end % MassPosition loop 
25 dlmwrite ( ' FILE . txt ' , TestFileData ) 

cd 
X 

end % Compress Count 

30 Example 16: Plasmid DNA transformation protocol for Pseudomonas 

a. Preparation of electroporation competent cells 

lml of overnight culture is inoculated into 100ml LB, bacteria are 
incubated in the 30C shaker until OD 600 reading reaches 0.5-0.7. The bacteria are 
harvested by spinning @ 3000rpm for 10 minutes at 4C. 

35 The resulting cell pellet is washed with 100ml ice-cold ddH20, spun @ 

3000rpm for 10 minutes at 4C to collect the cells. The washing is repeated. The cells 
are then washed with 50ml 10% ice-cold glycerol(in ddH20) once and collected by 
spinning @ 3000rpm for 10 minutes at 4C. The bacteria cell is resuspended into 2ml 
ice-cold 10% glycerol(in ddH20) 50ul or lOOul is aliquotted into each of the tubes 

40 and stored at -80C. 

b. Electroporation 

[0001] lul plasmid DNA is mixed with 50ul competent cell and kept on ice for 5 
minutes. The mixture is transferred to a pre-chilled cuvette(0.2cm gap, Bio-Rad). 
The DNA is transformed into bacteria by electroporation with Bio-Rad machine. 
45 (Setting: Volts: 2.25KV; time: 5ms; capacitance: 25uF). 

[0002] 300ul SOC medium is added to the cell mixture and bacteria are incubated at 
30C shaker for one hour. A certain amount of culture is spread on LA plate with 
antibiotics and the plates were incubated at 30C. 
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Example 17: Transformation of Yeast Cells by Electroporation 

One day before the experiment, 10 ml of YPD medium is inoculated 
with a single yeast colony of the strain to be transformed. It is grown overnight to 
saturation at 30°C. On the day of competent cell preparation, the total volume of 
5 yeast overnight culture is transferred to a 2L baffled flask containing 500 ml YPD 
medium. The culture is grown with vigorous shaking at 30°C to an OD600 = 0.8-1.0. 

500 ml of culture is harvested by centrifuging at 4000 x g, 4°C, for 5 
min in autoclaved bottles. The supernatant is subsequently discarded. The cell pellet 
is washed in 250 ml cold sterile water. Washing is repeated twice. The supernatant is 
10 discarded. 

The pellet is resuspended in 30 ml of ice-cold 1M Sorbitol. The 
suspension is transferred into a sterile 50 ml conical tube. The mixture is centrifiiged 
in a GP-8 centrifuge 2000 rpm, 4°C for 10 min. The supernatant is discarded. The 
pellet is resuspended in 50|il of ice-cold 1M Sorbitol. The final volume of resuspended 

1 5 yeast should be 1 .0 to 1 .5 ml and the final OD600 should be -200. 

In a sterile, ice-cold 1.5-ml microcentrifuge tube, 40ul concentrated 
yeast cells are mixed with lug of DNA contained in <5 jil. The mixture is transferred 
to an ice-cold 0.2-cm-gap disposable electroporation cuvette and pulsed at 1.5 kV, 25 
uF, 200Q. It should be noted that the time constant reported by the Gene Pulser will vary 

20 from 4.2 to 4.9 msec. Times <4 msec or the presence of a current arc (evidenced by a spark 
and smoke) indicate that the conductance of the yeast/DNA mixture is too high. 

400 \x\ ice-cold 1M sorbitol is added to the cuvette and the yeast is 
recovered, with gentle mixing. 200 |il aliquots of the east suspension should be 
spread directly on sorbitol selection plates. Incubate 3 to 6 days at 30°C until colonies 

25 appear. 

Literature Cited 

1 . Gibbs, J.B., Mechanism-Based Target Identification and Drug Discovery in 
Cancer Research. Science 2000, 287, 1969-73 

2. Garret, M.D., Workman, P. Discovering Novel Chemotherapeutic Drugs for the 
30 Third Millennium. Eur. J. Cancer 1999, 35, 2010-30 

3. Hanahan, et al., The Hallmarks of Cancer. Cell 2000, 100, 57-70 



158 



09010-400001 (DIVER 1280-36) 

4. Druker, et al., Lessons learned from the development of an Abl tyrosine kinase 
inhibitor for chronic myelogenous leukemia. J. Clin. Invest. 2000, 105, 3-7 

5. Sikic, B.I., New Approaches in cancer treatment. Ann. One. 1999, 10, S149-S153 

6. Gibbs, J.B., Anticancer drug targets: growth factors and growth factor signaling. J. 
5 Clin. Invest. 2000, 105, 9-13 

7. Drews, J., Drug Discovery: A historical perspective. Science 2000, 287, 1960-64 

8. Harvey, A.L., Medicines from nature: are natural products still relevant to drug 
discovery? Trends Pharmacol. Sci. 1999, 20, 196-197 

9. Cragg, G.M., Newman, D.J., Snader, K.M. Natural products in drug discovery and 
10 development. J. Nat. Prod. 1997, 60, 52-60 

10. Verdine, G.L., The combinatorial chemistry of nature. Nature 1996, 384, 11-13 

11. Demain, A.L., and J.E. Davies. Manual of industrial Microbiology and 
biotechnology; ASM Press: Washington D.C., 1999 

12. Mc Daniel, R., et al., Rational design of aromatic polyketide natural products by 
15 recombinant assembly of enzymatic subunits. Nature 1995, 375, 549-554 

13. Jacobsen, J.R., D.E. Cane, and C. Khosla, Spontaneous priming of a downstream 
module in 6-deoxyerythronolide B synthase leads to polyketide biosynthesis. 
Biochem. 1998, 37,4928-4934 

14. Donadio, S., McAlpine, J.B., Sheldon, P.J., Jackson, M., and Katz, L., An 

20 erythromycin analog produced by reprogramming of polyketide synthesis.Proc. Natl. 
Acad. Sci. U.S.A. 1993, 90, 7119-23 

15. Cortes, J. et al, Science, Repositioning of a domain in a modular polyketide 
synthase to promote specific chain cleavagel995, 268, 1487-89 

16. Amann, R.I.L.W., Schleifer K.H., Phylogenetic identification and in situ detection 
25 of individual microbial cells without cultivation. Microbiol. Rev. 1995, 59, 143-169 

17. Robertson, D.E., et al. The discovery of new biocatalysts from microbial diversity. 
SIM News 1996, 46, 3-8 

18. Stein, J.L., et al., Characterization of uncultivated prokaryotes: isolation and 
analysis of a 40-kilobase-pair genome fragment from a planktonic marine Archaeon. 

30 J. Bacteriol. 1996, 178, 591-599 

19. Short, J.M., Recombinant approaches for accessing biodiversity. Nat. Biotechnol. 
1997, 15, 1322-23 

159 



09010-400001 (DIVER 1280-36) 

20. Sundberg, S.A., High-throughput and ultra-high-throughout screening: solution- 
and cell-based approaches. Curr. Opin. Biotech. 2000, 1 1, 47-53 

21. Alvi, K.A., Pu, H., Asterriquinones produced by Aspergillus candidus inhibit 
binding of the Grb-2 adapter to phosphorylated EGF receptor tyrosine kinase. J. 

5 Antibiotics 1999, 52, 215-223 

22. Levitzki, A., Gazit, A., Tyrosine Kinase inhibition: an approach to drug 
development. Science 1995,267, 1782-88 

23. Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K., and J.D. Watson, Molecular 
biology of the cell; Garland Publishing, Inc.: New York, 1994 

10 24. Kolibaba, K.S., Druker, B.J., Protein tyrosine kinases and cancer. Biochim 
Biophysica Acta 1997, 1333, F217-F248 

25. Neal, D.E., Sharpies, L., Smith, K., Fennelly, J., Hall, R.R., Harris, A.L., The 
epidermal growth factor receptor and the prognosis of bladder cancer. Cancer 1990, 
65, 1619-25 

15 26. Nicholson, S., Richard, J., Sainsbury, C, Halcrow, P., Kelly, P., Angus, B., 
Wright, C., Henry, J., Farndon, J., Harris, A., Epidermal growth factor receptor 
(EGFr) status associated with failure of primary endocrine therapy in elderly 
postmenopausal patients with breast cancer. Br. J. Cancer 1991, 63, 146-150 

27. Klijn, J.G.M., Berns, P.M.J.J., Schmitz, P.I.M., Foekens, J.A., The clinical 

20 significance of epidermal growth factor receptor (EGF-R) in human breast cancer: a 
review on 5232 patients. Endocr. Rev. 1992, 12, 3-17 

28. Hiesiger, E., Hayes, R., Pierz, D., Budzilovich, G., Prognostic relevance of 
epidermal growth factor receptor (EGF-R) and c-neu/erbB2 expression in 
glioblastomas (GBMs). Neurooncol. 1993, 16, 93-104 

25 29. Tateishi, M., Ishida, T., Mitsudomi, T., Kaneko, S., Sugimachi, K., 

Immunohistochemical evidence of autocrine growth factors in adenocarcinoma of the 
human lung Cancer Res. 1990, 50, 7077-80 

30. Gorgoulis, V., Aninos, D., Mikou, P., Kanavaros, P., Karameris, A., Joardanoglu, 
J., Rasidakis, A., Veslemes, M., Ozanne, B., Spandidos, D.A., Expression of EGF, 
30 TGF-alpha and EGFR in squamous cell lung carcinomas Anticancer Res. 1992, 12, 
1183-87 



160 



09010-400001 (DIVER 1280-36) 

31 . Sharif, T.R., Sharif, M,, A high throughput system for the evaluation of protein 
kinase C inhibitors based on Elkl transcriptional activation in human astrocytoma 
cells. Int. J. One. 1999, 14, 327-335 

32. Li, Q., Vaingankar, S.M., Green, H.M., Green, M.M., Activation of the 9E3/cCAF 
5 chemokine by phorbol esters occurs via multiple signal transduction pathways that 

converge to MEK1/ERK2 and activate the Elkl transcription factor. J Biol Chem 
1999, 274, 15454 

33. Treisman, R., Regulation of transcription by MAP kinase cascades. Curr. Opin. 
Cell Biol. 1996, 8, 205-215 

10 34. Engler, D.A., Matsunami, R.K., Campion, S.R., Stringer, CD., Stevens, A., 
Niyogi, S., Cloning of authentic human epidermal growth factor as a bacterial 
secretory protein and its initial structure-function analysis by site-directed 
mutagenesis. J. Biol. Chem. 1988, 263, 12384-390 

35. Salmelin, C, Hovinen, J., Vilpo, J., Polymyxin permeabilization as a tool to 
15 investigate cytotoxicity of therapeutic aromatic alkylators in DNA repair-deficient 

Escherichia coli strains. Mut. Res. 2000, 467, 129-138 

36. Gray, F., Kenney, J.S., Dunne, J.F., Secretion capture and report web: use of 
affinity derivatized agarose microdroplets for the selection of hybridoma cells. J. 
Immunol. Methods 1995, 182, 155-163 

20 37. Powell, K.T., Weaver, J.C., Gel microdroplets and flow cytometry: rapid 
determination of antibody secretion by individual cells within a cell population. 
Bio/Technology 1990, 8, 333-337 

38. Jan van der Wal, F., Luirink, J., Oudega, B., Bacteriocin release proteins: made of 
action, structure, and biotechnological application. FEMS Biol. Rev 1995, 17, 381- 

25 399 

39. Majno, G., Joris, I., Apoptosis, oncosis, and necrosis: an overview of cell death. 
Am. J. Pathol. 1995, 146, 3-15 

40. Wyllie, A.H., Kerr, J.F.R., Currie, A.R., Cell death; the significance of apoptosis. 
Int. Rev. Cytol. 1980, 68, 251-356 

30 41 . Sikic, B.I., Rozencweig, M., Carter, S.K., Eds. Bleomycin chemotherapy; 
Academic Press: Orlando, FL, 1985 



161 



09010-400001 (DIVER 1280-36) 

42. Deng, JL., Newman, D.J., Hecht, S.M., Use of COMPARE analysis to discover 
functional analogues of bleomycin. J. Nat. Prod. 2000, 63, 1269-72 

43. Ortiz, L.A., Moroz, K., Liu, JY., Hoyle, G.W., Hammond, T., Hamilton, R., 
Holian, A., Banks, W., Brody, A.R., Friedman, M., Alveolar macrophage apoptosis 

5 and TNF-a, but not p53, expression correlate with murine, response to bleomycin. 
Am. J. Physiol. 1998, 275, L1208-L1218 

44. Kumagai, T., Sugiyama, M., Protection of mammalian cells from the toxicity of 
bleomycin by expression of a bleomycin-binding protein gene from streptomyces 
verticillus. J. Biochem. 1998, 124, 835-841 

10 45. Benitez-Bribiesca, L., Sanchez-Suarez, P., Oxidative damage, bleomycin, and 

gamma radiation induce different types of DNA strand breaks in normal lymphocytes 
and thymocytes. Ann. NY Academy Sci. 1999, 887, 133-149 
46. Du, L., Sanchez, C, Chen, M., Edwards, D.J., Shen, B., The biosynthetic gene 
cluster for the antitumor drug bleomycin from Streptomyces verticillus ATCC 15003 

1 5 supporting functional interactions between nonribosomal peptide synthetases and a 
polyketide synthase. Chem. & Biol. 2000, 7, 623-642 

49. Guiseley, K. B. US Patent 3,956,273, Modified Agarose and Agar and Methods of 
Making Same. May 11, 1976. 

50. Phospholipids Handbook; Cevc, G., Ed.; Marcel Dekker: New York, 1993. 

20 51. Ringsdorf, H.; Schlarb, B.; Venzmer, J. Molecular Architecture and Function of 
Polymeric Oriented Systems: Models for Study of Organization, Surface Recognition, 
and Dynamics of Biomembranes. Angew. Chem., Int. Ed. Engl. 1988, 27, 113 - 158 
and references cited therein. 

52.0'Brien, D. F.; Ramaswami, V. Polymerized Vesicles. Encycl. Polym. Sci. Eng. 
25 1989,17,108- 135. 

53. Nilsson, K.; Brodelius, P.; Mosbach, K. Entrapment of Microbial and Plant Cells 
in Beaded Polymers. Methods in Emzymology, 1987, 135, 222 - 230 and references 
cited therein. 

54. Kroger, N.; Deutzmann, R.; Sumper, M. Polycationic Peptides from Diatom 
30 Biosilica That Direct Silica Nanosphere Formation. Science 1999, 286, 1 129-1 132. 

55. Cha, et al., Biomimetic Synthesis of Ordered Silica Structures Mediated by Block 
Copolypeptides. Nature 2000, 403, 289 - 292. 



162 



09010-400001 (DIVER 1280-36) 

56. Bukanov, N. O., Demidov, V. V., Nielsen, P. E. & Frank-Kamenetskii, M. D. 
(1998). PD-loop: A complex of duplex DNA with an oligonucleotide. PNAS,95 
(10), 5516-5520. 

57. Brenner, S., Williams, S. R., Vermaas, E.H., Storck, T., Moon, K., McCollum, 
5 C, Mao, J., Luo, S., Kirchner, J. X, Eletr, S., DuBridge, R. B., Burcham, T. & 

Albrecht, G. (1999). In vitro cloning of complex mixtures of DNA on microbeads: 
Physical separation of differentially expressed cDNAs. PNAS, 97 (4), 1665-1670. 

58. Goryshin, I. Y., & Reznikoff, W. S. (1998). Tn5 in vitro transposition. J.Biol. 
Chem., 273, 7367-7374. 

10 59. Jayasena, V. K. & Johnston, B. H. (1993). Complement-stabilized D-loop: 
RecA-catalyzed stable pairing of linear DNA molecules at internal sites. J. Mol. 
Biol., 230,1015-1024. 

60. Lohse, J., Dahl, O. & Nielsen, P. E. (1999). Double duplex invasion by peptide 
nucleic acid: A general principle for sequence-specific targeting of double-stranded 

15 DNA. PNAS, 96 (21), 11804-11808. 

61 . Sena, E. P. & Zarling, D. A. (1993). Targeting in linear DNA duplexes with two 
complementary probe strands for hybrid stability. Nature Genetics 

Example 18: An exemplary novel high throughput cultivation method 

The invention provides a novel high throughput cultivation method 
20 based on the combination of a single cell encapsulation procedure with flow 

cytometry that enables cells to grow with nutrients that are present at environmental 
concentrations. 

Seawater was collected from sites located in the Sargasso Sea. 
Individual cells were concentrated from this seawater by tangential flow filtration and 

25 encapsulated in gel microdroplets (GMD). Similar GMDs have been used previously 
to grow bacteria 12 and for screening purposes 13 " 15 . Single encapsulated cells (see 
Methods) were transferred into chromatography columns (referred to henceforth as 
growth columns). Different culture media selective for aerobic, nonphototrophic 
organisms were pumped through the growth columns containing 10 million GMDs 

30 (Figure 24). The pore size of the GMDs allows the free exchange of nutrients. The 
encapsulated microorganisms were able to divide and form microcolonies of 
approximately 20 to 100 cells within the GMDs. Based on their distinctive light 
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scattering signature, these microcolonies were detected and separated by flow 
cytometry at a rate of 5,000 GMDs per second. The increase in forward and side 
scatter was shown by microscopy to be directly proportional to the size of the 
microcolony grown within the GMD. This property enabled discrimination between 
5 unencapsulated single cells, empty or singly occupied GMDs, and GMDs containing a 
microcolony (Figure 25). 

To determine the optimal growth medium for a broad diversity of 
organisms, four media were tested in the growth columns: Organic rich medium 
diluted in seawater (marine medium); seawater amended with a mixture of amino 

10 acids; seawater amended with inorganic nutrients; and sterile filtered seawater (Figure 
24). After five weeks of incubation, 1200 GMDs, each containing a microcolony, 
were collected by flow cytometry from each of the four growth columns. A 16S 
rRNA gene clone library was generated from each group of 1200 microcolonies and 
analysed. In diluted marine medium, only four bacterial species were identified, 

15 belonging to the genera Vibrio, Marinobacter or Cytophaga, all common sea water 
bacteria that have been cultivated previously 3 ' 9 . The media containing amino acids or 
inorganic minerals revealed slightly more diversity. Analysis of 50 clones derived 
from each medium yielded twelve different bacterial species from the amino acid 
supplemented medium, and eleven species from the inorganic medium. Filtered 

20 seawater alone (taken from the original sampling site) yielded the highest biodiversity 
(39 species out of 50 clones analysed), with many different phylogenetic groups 
represented. These results demonstrated that organisms capable of rapid growth 
outgrew their more fastidious neighbours in the presence of organic rich medium. 

Growth columns were next inoculated with GMDs again generated 

25 from samples obtained from the Sargasso Sea, but now using only filtered seawater as 
growth medium. From each of two growth columns, 500 GMDs containing 
microcolonies were sorted, and the 16S rRNA genes contained therein were amplified 
by PCR. A 16S rRNA gene library was also constructed from the original 
environmental sample from which the microorganisms were obtained for 

30 encapsulation. Most of the environmental 16S rRNA sequences derived from this 
latter sample fell within the nine common bacterioplankton groups 3 ' 11 . In contrast, 
many of the 150 16S rRNA gene sequences obtained from the microcolonies fell into 
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clades which contain no previously cultivated representatives (see supplementary 
information). Three of the most notable examples, described in more detail below, 
were clades affiliated with the Planctomycetes and relatives, the Cytophaga- 
Flavobacterium-Bacteroides and relatives, and the alpha subclass of Proteobacteria 
5 (Figure 26). None of these groups were detected within the environmental 16S rRNA 
gene clone library (167 clones analysed). 

Five microcolony 16S rRNA gene sequences were related to the 
Planctomycetales, one of the main phylogenetic branches of the domain Bacteria 3 
(Figure 26a). Sequencing of cloned rRNA genes from marine environments had 

10 previously revealed several new, apparently uncultivated phylotypes within the 
Planctomycetales 16 " 18 . Many of these new phylotypes fall within a single, highly 
diverse monophyletic clade that, prior to this study, contained no cultivated 
representatives. The five Planctomycetales-related microcolonies identified in this 
study form two separate lineages within this deep branching Planctomycetales clade 

15 (Figure 26a). One lineage, represented by sequences GMD21C08, GMD14H10, and 
GMD14H07 (Figure 26a), was most closely related to 16S rRNA gene clone 
sequences recovered from bacteria associated with marine corals (84.9-89.2% 
similar) 17 . The second lineage, represented by GMD16E07 and GMD15D02 (Figure 
26a), form a unique line of descent within this clade, and are <84% similar to all 

20 previously published 16S rRNA gene sequences. 

Two microcolony 16S rRNA gene sequences fell within the 
Cytophaga-Flavobacterium-Bacteroides and their relatives. These two closely related 
sequences form a lineage within a cluster of gene clone sequences from 
predominantly marine and hypersaline environments 19 " 21 . This cluster occupies one of 

25 the deepest phylogenetic branches of the Cytophaga-Flavobacterium-Bacteroides and 
relatives group; only the Rhodothermus/Salinibacter lineage is deeper 20 . Within this 
cluster, the two microcolony gene sequences were nearly identical (>99% similar) to 
environmental 16S rRNA gene clone sequences obtained from seawater collected off 
of the Atlantic coast of the United States 21 (Figure 26b). Analysis of Phase II cultures 

30 (see later) obtained from these sorted microcolonies (Figure 24) revealed a culture 
(strain GMDJE10E6) with an identical 16S rRNA gene sequence that reached an 
optical density (OD 6 oonm) of 0.3 (Figure 26d). 
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A cluster of six microcolonies was recovered that was phylogenetically 
affiliated with a previously uncultivated lineage of 16S rRNA gene clone sequences 
within the alpha subclass of the Proteobacteria (Figure 26c). The microcolony 
sequences formed two subclusters; one was closely related to two 16S rRNA gene 
5 clone sequences recovered from marine samples taken from a coral reef (95.1-98.6% 
similar) (GenBank U87483 and U87512); the second was moderately related to the 
same coral reef-associated environmental gene clones (87.9-95.7% similar). 

Thus, the application of this novel high throughput cultivation method 
resulted in the growth and isolation of several bacteria representing previously 

10 uncultured phylotypes (see supplementary information). This reflects the ability of 
GMDs to permit the simultaneous and non-competitive growth of both slow and fast 
growing microorganisms in media with very low substrate concentrations. The 
physical separation of cells (contained in the GMDs within the growth columns), 
combined with flow cytometry isolation of microcolonies at different times of 

15 incubation, enabled the cultivation of a broad range of bacteria, and prevented over- 
growth by the fast growing microorganisms (the "microbial weeds") 9 . 

To test if this novel high throughput cultivation method is applicable to 
different environments, we applied the technology to an alkaline lake sediment (Lake 
Bogoria, Kenya, data not shown) and to a soil sample (Ghana). Microorganisms from 

20 the soil sample were separated from the soil matrix, encapsulated and incubated in the 
growth column under aerobic conditions in the dark. Diluted soil extract, obtained 
from the same sample, was used as growth medium. The microcolonies were analysed 
by 16S rRNA gene sequencing. To cater for bacteria with disparate growth rates, 
microcolonies were separated from the growth column by flow cytometry at different 

25 time points. 16S rRNA gene sequence analysis revealed that many phylogenetically 
different microorganisms could be cultivated within the GMDs in Phase I (Figure 24) 
(see supplementary information). This approach can be extended to many other 
physiological and environmental conditions. For example, it was demonstrated that 
encapsulated cells of Methanococcus thermolithotrophicus can grow and form 

30 microcolonies within GMDs when incubated under strictly anaerobic conditions. 

Physiological studies, natural product screening or studies of cell-cell 
interaction require the ability to grow microorganisms to a certain cell mass. 
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Therefore we designed experiments to determine if these microcolonies are able to 
serve as inocula for larger scale microbial cultures (Figure 24, Phase II). 
Encouragingly, earlier microscopic analysis had revealed that encapsulated bacteria 
could indeed grow out of GMDs when provided with a rich supply of nutrients. 
5 GMDs were obtained from a soil sample (Ghana), as described above. After growth in 
diluted soil extract medium, microcolonies were sorted into organic rich medium 
(Figure 24, Phase II). A total of 960 GMDs containing microcolonies, each derived 
from a single organism, were sorted into 96 well microtiter plates filled with organic 
rich medium (1 GMD per well). The 960 cultures were analysed for growth by 

10 measuring optical densities (ODeoonm). After one week of incubation, 67% of the 
cultures showed turbidity above OD 0.1, corresponding to at least 10 7 cells per 
millilitre. Cell densities were high enough to permit the detection of anti-fungal 
activity among some of the cultures (data not shown). To analyse the diversity within 
these cultures in more detail, 100 randomly picked cultures were analysed by 16S 

15 rRNA gene sequencing, revealing many different species (see supplementary 

information). The remaining 33% of the cultures that did not grow to measurable 
densities (fewer then 10 6 cells per millilitre), showed bacterial growth when assessed 
microscopically. This is consistent with recent reports indicating that certain bacteria 
do not grow to cell densities greater than 1 0 6 cells per millilitre 1 1 . 

20 In order to maintain and access microcolonies for physiological 

studies, we evaluated the minimal number of cells required for passaging by re- 
encapsulation and detection by flow cytometry. Flow cytometry analysis of 1000 and 
100 individually encapsulated cells resulted in the detection of 360 and 15 
microcolonies, respectively. Even when using cultures comprising just 10 bacterial 

25 cells, this method allowed recovery of, on average, one viable bacterial culture. This 
experiment demonstrates that it is possible to transfer, and therefore maintain, a 
culture of 100 cells derived directly from a microcolony. 

GMDs separate microorganisms from each other, while still allowing 
the free flow of signalling molecules between different microcolonies. Therefore, this 

30 method might be applicable for the analysis of interactions between different 

organisms under in situ conditions, for example by inserting the encapsulated cells 
back into the environment (e.g. the open ocean). The simultaneous encapsulation of 
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more than one cell (prokaryotic as well as eukaryotic) into one GMD might also be 
used to mimic conditions found in nature, allowing analysis of cell-cell interactions. 
Another advantage of this technology is the very sensitive detection of growth. This 
high throughput cultivation method allows the detection of microcolonies containing 
5 as few as 20 to 100 cells. Nutrient sparse media, such as seawater, were sufficient to 
support growth, and yet their carbon content was low enough to prevent "microbial 
weeds" from overgrowing slow growing microorganisms. We have demonstrated that 
this technology can be used to culture thus far uncultivated microorganisms. The 
microcolonies obtained can then be used as inocula for further cultivation. 

10 In combination with rRNA analysis and mixed organism recombinant 

screening approaches 22,23 , this technology will permit a more complete understanding 
of unexplored microbial communities. It will find applications in environmental 
microbiology, whole cell optimisation, and drug discovery. The combination of 
cultivation with direct DNA amplification from microcolonies will undoubtedly 

1 5 contribute to a broader understanding of microbial ecology by linking microbial 
diversity with metabolic potential. 
Methods 

Sample collection 

Water samples were collected in the Sargasso Sea (31°50' N 64°10'W 

20 and 32°05' N 64°30'W) at depths of 3m and 300m. For each sample, a volume of 130 
1 was concentrated by tangential flow filtration. Soil samples were collected from 
tropical forest (05°56'N 00°03') and chaparral (05°55'N OOTO'W) in Ghana and 
combined in equal amounts. Cells were separated from the soil matrix by repeated 
sheering cycles followed by density gradient centrifugation 24 . 

25 Cell encapsulation and growth conditions 

Concentrated cell suspensions were used for encapsulation. Single 
occupied gel microdroplets (GMDs) were generated by using a CellSys 100™ 
microdrop maker (OneCell System) according to the manufacturer's instructions. 
Encapsulation of single cells was monitored by microscopy. The GMDs were 

30 dispensed into sterile chromatography columns XK-16 (Pharmacia Biotec) containing 
25 ml of media. Columns were equipped with two sets of filter membranes (0. 1 nm at 
the inlet of the column and 8 fim at the outlet). The filters prevented free-living cells 
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contaminating the media reservoir and retained GMDs in the column while allowing 
free-living cells to be washed out. 

Media were pumped through the column at a flow rate of 13 ml/h. 
Media used for incubation of marine samples were: Sargasso Sea water filter 
5 sterilized (SSW); SSW amended with NaN0 3 (4.25 g/1), K 2 HP0 4 (0.01 6 g/1), NH 4 C1 
(0.27 g/1), trace metals and vitamins 25 ; SSW amended with amino acids at 
concentrations between 6 to 30 nM 26 and marine medium (R2A, Difco) diluted in 
SSW (1:100, vol/vol). Soil extracts were prepared as previously described 27 and 
added to the media at final concentrations of 25 to 40 ml/1 in 0.85% NaCl (vol/vol). 

10 GMDs were incubated in the columns for a period of at least 5 weeks. Microcolonies 
that were sorted individually into 96 well microtitre plates were grown with marine 
medium (R2A, Difco) in SSW or with soil extracts amended with glucose, peptone, 
and yeast extract (1 g/1) and humic acids extract 0.001% (vol/vol). 
2. Flow cytometry 

15 GMDs containing colonies were separated from free-living cells and 

empty GMDs by using a flow cytometer (MoFlo, Cytomation). Precise sorting was 
confirmed by microscopy. For the re-encapsulation experiment, a series of 1000, 100 
and 10 Escherichia coli cells (expressing a green fluorescent protein, ZsGreen, 
Clontech), were individually encapsulated and incubated for three hours to form 

20 microcolonies within the GMDs. GMDs were analysed by flow cytometry and sorted. 
Phylogenetic analysis 

Ribosomal RNA genes from environmental samples, microcolonies 
and cultures were amplified by PCR using general oligonucleotide primers (27F and 
1392R) for the domain Bacteria. To avoid nonspecific amplification, PCR reactions 

25 were irradiated with an UV Stratalinker (Stratagene) at maximum intensity prior to 
template addition. After cloning (TOPO-TA, Invitrogen), inserts were screened by 
their restriction pattern obtained with Aval, BamHI, EcoRI, Hindlll, Kpnl, and Xbal. 
Nearly full length 1 6S rRNA gene sequences were obtained and added to an aligned 
database of over 12,000 homologous 16S rRNA primary structures maintained with 

30 the ARB software package 28 . Phylogenetic relationships were evaluated using 

evolutionary distance, parsimony, and maximum likelihood methods, and were tested 
with a wide range of bacterial phyla as outgroups 29 . Hypervariable regions were 
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masked from the alignment. The phylogenetic trees shown in Figure 26 demonstrates 
the most robust relationships observed, and was determined using evolutionary 
distances calculated with the Kimura 2-parameter model for nucleotide change and 
neighbour-joining. Bootstrap proportions from 1000 resamplings were determined 
5 using both evolutionary distance and parsimony methods. Short reference sequences 
were added to the phylogenetic trees with the parsimony insertion tool of ARB, and 
are indicated by dotted lines. 
References 

1 . Pace, N. R. A molecular view of microbial diversity and the biosphere. 
10 Science 276, 734-740 (1997). 

2. Amann, R. I., Ludwig, W. & Schleifer, K.-H. Phylogenetic 
identification and in situ detection of individual microbial cells without cultivation. 
Microbiol Rev 59, 143-169 (1995). 

3. Giovannoni, S. J. & Rappe, M. in Microbial Ecology of the Ocean (ed. 
1 5 Kirchman, D. L.) 47-84 (Wiley-Liss Inc., 2000). 

4. Fuhrman, J. A., McCallum, K. & Davis, A. A. Phylogenetic diversity 
of subsurface marine microbial communities from the Atlantic and Pacific Oceans. 
Appl Environ Microbiol 59, 1294-1302 (1993). 

5. Kaeberlein, T., Lewis, K. & Epstein, S. S. Isolating "uncultivable" 
20 microorganisms in pure culture in a simulated natural environment. Science 296, 

1127-1129 (2002). 

6. Beja, O. et al. Bacterial rhodopsin: evidence for a new type of 
phototrophy in the sea. Science 289, 1902-1906 (2000). 

7. Beja, O. et al. Unsuspected diversity among marine aerobic 
25 anoxygenic phototrophs. Nature 415, 630-633 (2002). 

8. Ferguson, R. L., Buckley, E. N. & Palumbo, A. V. Response of marine 
bacterioplankton to differential filtration and confinement. Appl Environ Microbiol 
47, 49-55 (1984). 

9. Eilers, H., Pernthaler, J., Glockner, F. O. & Amann, R. Culturability 
30 and in situ abundance of pelagic bacteria from the North Sea. Appl Environ Microbiol 

66, 3044-3051 (2000). 



170 



09010-400001 (DIVER 1280-36) 

10. Xu, H. S. et al. Survival and viability of nonculturable Escherichia coli 
and Vibrio cholerae in the estuarine and marine environment. Microb Ecol 8, 313-323 
(1982). 

11. Rappe, M. S., Connon, S. A., Vergin, K. L. & Giovannoni, S. J. 

5 Cultivation of the ubiquitous SARI 1 marine bacterioplankton clade. Nature In press 
(2002). 

12. Manome, A. et al. Application of gel microdroplet and flow cytometry 
techniques to selective enrichment of non-growing bacterial cells. FEMS Microbiol 
Lett 197, 29-33 (2001). 

10 13. Short, J. M. & Keller, M. High throughput screening for novel 

enzymes. U.S. Patent No. 6,174,673B1 (2001). 

14. Powell, K. T. & Weaver, J. C. Gel microdroplets and flow cytometry: 

rapid determination of antibody secretion by individual cells within a cell population. 

Bio/Technology 8, 333-337 (1990). 
15 15. Ryan, C, Nguyen, B. T. & Sullivan, S. J. Rapid assay for 

mycobacterial growth and antibiotic susceptibility using gel microdrop encapsulation. 

J Clin Microbiol 33, 1720-1726 (1995). 

16. Bowman, J. P., Rea, S. M., McCammon, S. A. & McMeekin, T. A. 
Diversity and community structure within anoxic sediment from marine salinity 

20 meromicitc lakes and a coastal meromictic marine basin, Vestfold Hilds, Eastern 
Australia. Environ Microbiol 2, 227-237 (2000). 

17. Frias-Lopez, J., Zerkle, A. L., Bonheyo, G. T. & Fouke, B. W. 
Partitioning of bacterial communities between seawater and healthy, black band 
diseased, and dead coral surfaces. Appl Environ Microbiol 68, 2214-2228 (2002). 

25 18. Ravenschlag, K., Sahm, K., Pernthaler, J. & Amann, R. High bacterial 

diversity in permanently cold marine sediments. Appl Environ Microbiol 65, 3982- 
3989 (1999). 

19. Tanner, M. A., Everett, C. L., Coleman, W. J., Yang, M. M. & 
Youvan, D. C. Complex microbial communities inhabiting sulfide-rich black mud 
30 from marine coastal environments. Biotechnology et alia 8, 1-16 (2000). 



171 



09010-400001 (DIVER 1280-36) 

20. de Souza, M. P. et al. Identification and characterization of bacteria in 
a selenium- contaminated hypersaline evaporation pond. Appl Environ Microbiol 67, 
3785-3794 (2001). 

21 . Kelly, K. M. & Chistoserdov, A. Y. Phylogenetic analysis of the 

5 succession of bacterial communities in the Great South Bay (Long Island). FEMS 
Microbiol Ecol 35, 85-95 (2001). 

22. Short, J. M. Recombinant approaches for accessing biodiversity. 
Nature Biotechnology 15, 1322-1323 (1997). 

23. Robertson, D. E., Mathur, E. J., Swanson, R. V., Marrs, B. L. & Short, 
10 J. M. The discovery of new biocatalysts from microbial diversity. SIM News 46, 3-8 

(1996) . 

24. Faegri, A., Torsvik, V. L. & Goksoyr, J. Bacterial and fungal activities 
in soil: separation of bacteria and fungi by a rapid fractionated centrifugation 
technique. Soil Biol Biochem 9, 105-112 (1977). 

15 25 . Widdel, F. & Bak, F. in The Prokaryotes (eds. Balows, A., Triiper, H. 

G., Dworkin, M., Harder, W. & Schleifer, K.-H.) 3352-3392 (Springer-Verlag, New 
York, 1992). 

26. Ouverney, C. C. & Fuhrman, J. A. Marine planktonic archaea take up 
amino acids. Appl Environ Microbiol 66, 4829-4833 (2000). 
20 27. Vobis, G. in The Prokaryotes (eds. Balows, A., Triiper, H.G., Dworkin, 

M., Harder, W. & Schleifer, K.-H.) 1029-1060 (Springer-Verlag, New York, 1992). 

28. Strunk, O. & Ludwig, W. in http://www.mikro.biologie.tu- 
muenchen.de (Department of Microbiology, Technische Universitat Miinchen, 
Munich, Germany, 1998). 

25 29. Ludwig, W. et al. Detection and in situ identification of representatives of 

a widely distributed new bacterial phylum. FEMS Microbiol Lett 153, 181-190 

(1997) . 

While the invention has been described in detail with reference to certain preferred 
30 aspects thereof, it will be understood that modifications and variations are within the 
spirit and scope of that which is described and claimed. 

172 



